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NOVEL DNA GLYCOSYLASES AND THEIR USE 

This invention relates to new DNA-glycosylases , in 
5 particular new cytosine-, thymine- and uracil -DNA 

glycosylases, and their use for mutagenesis, for DNA 
modification and cell killing. 

Damage to DNA arises continually throughout the cell 
cycle and must be recognised and repaired prior to the 

10 next round of replication to maintain the genomic 
integrity of the cell. DNA base damage can be 
recognised and excised by the ATP-dependent nucleoside 
excision repair systems or by base excision repair 
systems exemplified by the DNA glycosylases. 

15 DNA glycosylases are enzymes that occur normally in 

cells. They release bases from DNA by cleaving the bond 
between deoxyribose and the base in DNA. Naturally 
occurring glycosylases remove damaged or incorrectly 
placed bases. This base excision repair pathway is the 

2 0 major cellular defence mechanism against spontaneous DNA 
damage . 

DNA glycosylases which have been identified are 
directed to specific bases or modified bases. An 
example of a DNA glycosylase which recognizes an 

25 unmodified base is uracil DNA glycosylase (UDG) , which 

specifically recognises uracil in DNA and initiates base 
excision repair by hydrolysing the N-Cl ' glycosylic bond 
linking the uracil base to the deoxyribose sugar. This 
creates an abasic site that is removed by a 5 1 -acting 

30 apurinic/apyrimidic (AP) endonuclease and a 

deoxyribophosphodiesterase, leaving a gap which is 
filled by DNA polymerase and closed by DNA ligase. 

The activity of UDG serves to remove uracil which 
arises in DNA as a result of incorporation of dUMP 

35 instead of dTMP during replication or from the 

spontaneous deamination of cytosine. Deamination of 
cytosine to uracil creates a premutagenic U:G mismatch 
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that, unless repaired, will cause a GC AT transition 
mutation. 

Tn vivo . UDGs specifically recognise and remove 
uracil from within DNA and cleave the glycosylic bond to 
5 initiate the uracil excision pathway. In vitro, UDG f s 
can recognise and remove uracil from both single 
stranded DNA (ssDNA) and double- stranded DNA (dsDNA) 
substrates . 

UDGs are ubiquitous enzymes and have been isolated 

10 from a number of sources. Amino acid sequencing reveals 
that the enzymes are conserved throughout evolution with 
greater than about 55% amino acid identity between human 
and bacterial proteins. A cDNA for human UDG has been 
cloned and the corresponding gene has been named UNG 

15 (Olsen et al . (1989) EMBO J., 8: 3121-3125). 

The crystal structures of the human enzyme (Mol et 
al., (1995) Cell, 80: 869-878) and the herpes simplex 
virus enzymes (Sawa et al . (1995) Nature, 373: 487-493) 
have recently been determined and reveal that uracil 

20 binds in a rigid pocket at the base of the DNA binding 
groove of human UDG. The absolute specificity of the 
enzyme for uracil over the structurally related DNA 
bases thymine and cytosine is conferred by shape 
complementarity, as well as main chain and side chain 

25 hydrogen bonds. 

Although UDG's do not have activity against other 
bases as a result of the afore -mentioned specific 
spatial and charge characteristics of the active site, 
other glycosylases with different activities have been 

30 identified, which may or may not be restricted to single 
substrates . 

A naturally- occurring thymine-DNA glycosylase has 
been identified which in addition to releasing thymine 
also releases uracil (Nedderman & Jiricny (1993) J. 
35 Biol. Chem., 2S8: 21218-21114 ; Nedderman & Jiricny 

(1994) J. Proc. Natl. Acad. Sci. U.S.A., 91: 1642-1646). 
This thymine-DNA glycosylase however has activity in 
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respect of only certain substrates and has an absolute 
requirement for a mismatched U or T opposite of a G in a 
double -stranded substrate and will not recognise T or U 
from T(U) :A matches or a single- stranded substrate. DNA 
5 glycosylases which recognize and release unmodified 
bases other than uracil and thymine (in certain 
substrates, as mentioned above) have not been 
identified. 

A DNA glycosylase recognizing unmodified cytosine 

10 has not been reported, although a 5- 

hydroxymethylcytosine-DNA glycosylase activity was 
detected in mammalian cells (Cannon et al . (1988) 
Biochem. Biophys . Res. Coinra., 151: 1173-1179). The 
sequences of the afore-mentioned thymine and 5- 

15 hydroxymethyl cytosine DNA glycosylases have not yet been 
reported and it is unknown whether their active site may 
be structurally related to UDG. 

It has now surprisingly been found that the 
substitution of certain of the UDG amino acids has a 

20 profound effect on the substrate specificity of the 

glycosylase. In particular, the replacement of Asn204 
by Asp204 results in the production of a mutant enzyme 
which has acquired cytosine-DNA glycosylase (CDG) 
activity, while retaining some UDG-activity . 

25 Alternatively, replacing Tyrl47 with Alal47 allows for 
binding of thymine, resulting in an enzyme that has 
acquired thymine -DNA glycosylase (T*DG) activity. 

These new DNA glycosylases are not product -inhibited 
by added uracil, in contrast to UDG and other UDG- 

3 0 mutants. Compared with the efficiency of wild type UDG 
in removal of uracil, the activity of the new DNA 
glycosylases that remove normal pyrimidines in DNA is 
low, but distinct and easily detectable. However, it 
should be noted that the very high turnover of UDG 

35 appears to be unique among DNA glycosylases and turnover 
numbers of other DNA glycosylases may be as low, or even 
lower than those of the engineered glycosylases CDG and 
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TDG. This may result from the narrow substrate 
specificity of UDG. 

Furthermore, an additional new UDG has been 
identified. The complete sequence of the UNG gene was 
5 recently published (Haug et al . , 1996, Genomics, 36, 
p408-416) . As mentioned previously, cDNA to this UNG 
gene has been identified by Olsen et al . , 1989, supra 
(hereinafter referred to as UNG1 cDNA and the expressed 
protein referred to as UNG1) . It has now surprisingly 

10 been found that alternative splicing of the genomic DNA 
(UNG) with an exon located 5' of exon 1 which was not 
previously recognized results in a new distinct cDNA 
with an open reading frame of 313 amino acids. The new 
UNG cDNA is referred to hereinafter as UNG2 cDNA, and 

15 the product which it encodes, UNG2 . The latter protein 
has a predicted size of 36kDa. 

UNG2 differs from the previously known form (UNG1, 
ORF 304 amino acid residues) in the 44 amino acids of 
the N- terminal presequence, which is not necessary for 

20 catalytic activity. The rest of the presequence and the 
catalytic domain, altogether 269 amino acids, are 
identical. The alternative presequence in UNG2 arises 
by splicing of a previously unrecognized exon (exon 1A) 
into a consensus splice site after codon 35 in exon IB 

25 (previously designated exon 1) . The UNG1 presequence 

starts at codon 1 in exon IB and thus has 35 amino acids 
not present in UNG2 . Coupled transcription/translation 
in rabbit reticulocyte lysates demonstrated that both 
proteins are catalytically active. Similar forms of 

30 UNG1 and UNG2 are expressed in mouse which has an 
identical organization of the homologous gene. 
Furthermore, the presequence of a putative Xiphophorus 
UNG2 protein predicted from the gene structure is 
homologous to mammalian UNG 2 , but much shorter, 

35 suggesting a very high degree of conservation from fish 
to man. 

The invention therefore provides a DNA glycosylase 
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capable of releasing cytosine bases from single stranded 
(ss) DNA and/or double stranded (ds) DNA or thymine 
bases from both single stranded (ss) DNA and double 
stranded (ds) DNA or from single stranded (ss) DNA or 
5 uracil bases from single stranded (ss) DNA and/or double 
stranded (ds) DNA, wherein said uracil-DNA glycosylase 
is encoded by a nucleic acid molecule comprising the 
sequence (SEQUENCE I.D. Nos 1 and 2): 

10 1 CACAGCCACA GCCAGGGCTA GCCTCGCCGG TTCCCGGGTG GCGCGCGTTC GCTGCCTCCT 

61 CAGCTCCAGG ATGATCGGCC AGAAGACGCT CTACTCCTTT TTCTCCCCCA GCCCCGCCAG 
MIG QKT L Y S F FSP SPA 

GCCCCCAGCC CCGAGCCGGC CGTCCAGGGG ACCGGCGTGG CTGGGGTGCC 
APS PEP AVQG TGV A G V 

GGAGATGCGG CGGCCATCCC AGCCAAGAAG GCCCCGGCTG GGCAGGAGGA 
GDA A A I PAKK A P A GQE 

CCGCCCTCCT CGCCGCTGAG TGCCGAGCAG TTGGACCGGA TCCAGAGGAA 
PPS SPL SAEQ L D R IQR 

GCCCTGCTCA GACTCGCGGC CCGCAACGTG CCCGTGGGCT TTGGAGAGAG 
ALL RLA ARNV PVG FGE 

CACCTCAGCG GGGAGTTCGG GAAACCGTAT TTTATCAAGC TAATGGGATT 
HLS GEF GKPY FIK LMG 

GAAAGAAAGC ATTACACTGT TTATCCACCC CCACACCAAG TCTTCACCTG 
ERK HYT VYPP PHQ VFT 

TGTGACATAA AAGATGTGAA GGTTGTCATC CTGGGACAGG AT C C AT AT C A 
CDI KDV KVVI LGQ DPY 

CAAGCTCACG GGCTCTGCTT TAGTGTTCAA AGGCCTGTTC CGCCTCCGCC 
QAH GLC FSVQ RPV PPP 

AACATTTATA AAGAGTTGTC TACAGACATA GAGGATTTTG TTCATCCTGG 
NIY KEL STDI EDF VHP 

TTATCTGGGT GGGCCAAGCA AGGTGTTCTC CTTCTCAACG CTGTCCTCAC 
LSG WAK QGVL L L N A V L 

CAT CAAGCCA ACTCTCATAA GGAGCGAGGC TGGGAGCAGT TCACTGATGC 
HQA NSH KERG WEQ FTD 


15 121 
181 

20 

241 

301 

25 

361 

30 421 
481 

35 

541 
601 

40 

661 

45 721 


GAAGCGACAC 
R K R H 

TGAGGAAAGC 
PEES 

GCCTGGGACG 
E P G T 

CAAGGCCGCG 
N K A A 

CTGGAAGAAG 
S W K K 

TGTTGCAGAA 
F V A E 

GACCCAGATG 
W T Q M 

TGGACCTAAT 
H G P N 

CAGTTTGGAG 
P S L E 

CCATGGAGAT 
G H G D 

GGTTCGTGCC 
T V R A 


50 


781 


AGTTGTGTCC TGGCTAAATC AGAACTCGAA TGGCCTTGTT TTCTTGCTCT GGGGCTCTTA 
AVVS WLN QNS NGLV FLL WGS 
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841 TGCTCAGAAG AAGGGCAGTG CCATTGATAG GAAGCGGCAC CATGTACTAC AGACGGCTCA 
YAQK KGS AID RKRH HVL QTA 

901 TCCCTCCCCT TTGTCAGTGT ATAGAGGGTT CTTTGGATGT AGACACTTTT CAAAGACCAA 
5 HPSP LSV YRG FFGC RHF SKT 

961 TGAGCTGCTG CAGAAGTCTG GCAAGAAGCC CATTGACTGG AAGGAGCTGT GATCATCAGC 
NELL QKS GKK PIDW KEL 

10 1021 TGAGGGGTGG CCTTTGAGAA GCTGCTGTTA ACGTATTTGC CAGTTACGAA GTTCCACTGA 

1081 AAATTTTCCT ATTAATTCTT AAGTACTCTG CATAAGGGGG AAAAGCTTCC AG AAAGC AG C 

1141 CATGAACCAG GCTGTCCAGG AATGGCAGCT GTATCCAACC ACAAACAACA AAGGCTACCC 

15 

1201 TTTGACCAAA TGTCTTTCTC TGCAACATGG CTTCGGCCTA AAATATGCAG AAGACAGATG 
1261 AGGTCAAATA CTCAGTTGGC TCTCTTTATC TCCCTTGCCT TTATGGTGAA ACAGGGGAGA 
20 1321 TGTGCACCTT TCAGGCACAG CCCTAGTTTG GCGCCTGCTG CTCCTTGGTT TTGCCTGGTT 

1381 AGACTTTCAG TGACAGATGT TGGGGTGTTT TTGCTTAGAA AGGTCCCCTT GTCTCAGCCT 
1441 TGCAGGGCAG GCATGCCAGT CTCTGCCAGT TCCACTGCCC CCTTGATCTT TGAAGGAGTC 

25 

1501 CTCAGGCCCC TCGCAGCATA AGGATGTTTT GCAACTTTCC AG AATCTGG C CCAGAAATTA 
1561 GGGCTCAATT TCCTGATTGT AGTAGAGGTT AAGATTGCTG TGAGCTTTAT CAGATAAGAG 
30 1621 ACCGAGAGAA GTAAGCTGGG TCTTGTTATT CCTTGGGTGT TGGTGGAATA AGCAGTGGAA 

16 81 TTTGAACAAG GAAGAGGAGA AAAGGGAATT TTGT CTTTAT GGGGTGGGGT GATTTTCTCC 
1741 TAGGGTTATG TCCAGTTGGG GTTTTTAAGG CAGCACAGAC TGCCAAGTAC TGTTTTTTTT 

35 

18 01 AACCGACTGA AATCACTTTG GGATATTTTT TCCTGCAACA CTGGAAAGTT TTAGTTTTTT 
18 61 AAGAAGTACT CATGCAGATA TATATATATA TATTTTTCCC AGTCCTTTTT TTAAGAGACG 
40 1921 GTCTTTATTG GGTCTGCACC TCCATCCTTG ATCTTGTTAG CAATGCTGTT TTTGCTGTTA 

1981 GTCGGGTTAG AGTTGG CTCT ACGCGAGGTT TGTTAATAAA AGTTTGTTAA AAGTTCAAAA 
2041 AAAAAAAAAA AAA 

45 

or a fragment thereof encoding a catalytically active 
product comprising at least nucleotides 121 to 130, 
preferably 71 to 202 in addition to the catalytic 
domain, or a sequence which is degenerate, substantially 
50 homologous with or which hybridizes with at least 

nucleotides 121 to 130, preferably 71 to 202 of any such 
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aforesaid sequence. 

In particular, viewed from one aspect, the invention 
can be seen as providing a cytosine-DNA glycosylase 
(CDG) capable of releasing cytosine bases from ssDNA 
5 and/or dsDNA. 

A further aspect of the invention provides a 
cytosine-DNA glycosylase (CDG) capable of releasing both 
cytosine and uracil bases from ssDNA and/or dsDNA. 
Preferably, the cytosine-DNA glycosylase is one 

10 derived from a UDG and especially from the human UDG 
protein which has Asn at amino acid position 204 . In 
particular, the novel CDG of the invention is preferably 
derived from human UDG and has an amino acid 
substitution or modification at position 204. 

15 Modification of UDG from other species at an equivalent 
residue is similarly preferred. Especially preferably, 
the glycosylase is human UDG having an aspartic acid 
residue (Asp) at position 204. 

Another aspect of the invention provides a thymine- 

20 DNA glycosylase (TDG) capable of releasing thymine bases 
from both ssDNA and dsDNA. 

A further aspect of the invention provides a 
thymine-DNA glycosylase (TDG) capable of releasing both 
thymine and uracil bases from both ssDNA and dsDNA. 

25 Yet further aspects of the invention provide a 

thymine-DNA glycosylase (TDG) capable of releasing 
thymine bases from A:T DNA pairs and a thymine-DNA 
glycosylase (TDG) capable of releasing thymine bases 
\- - from single stranded DNA. 

30 Preferably, the thymine-DNA glycosylase is one 

derived from a UDG, and especially from the human UDG 
protein which has Tyr at amino acid position 147 . In 
particular, the novel CDG of the invention is preferably 
derived from human UDG and has an amino acid 

35 substitution or modification at position 147. 

Modification of UDG from other species at an equivalent 
residue is similarly preferred. Especially preferably, 
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the glycosylase is human UDG having a alanine residue 
(Ala) at position 147. 

A yet further aspect of the invention provides a 
uracil -DNA glycosylase encoded by a nucleic acid 
5 molecule comprising the sequence (SEQUENCE I.D Nos l and 
2) : 

1 CACAGCCACA GCCAGGGCTA GCCTCGCCGG TTCCCGGGTG GCGCGCGTTC GCTGCCTCCT 

10 61 CAGCTCCAGG ATGATCGGCC AGAAGACGCT CTACTCCTTT TTCTCCCCCA GCCCCGCCAG 

MIG QKT L Y S F FSP SPA 


15 


30 


45 


121 GAAGCGACAC GCCCCCAGCC CCGAGCCGGC CGTCCAGGGG ACCGGCGTGG CTGGGGTGCC 
RKRH APS PEP AVQG TGV A G V 

181 TGAGGAAAGC GGAGATGCGG CGGCCATCCC AGCCAAGAAG GCCCCGGCTG GGCAGGAGGA 
PEES GDA A A I PAKK APA GQE 


241 GCCTGGGACG CCGCCCTCCT CGCCGCTGAG TGCCGAGCAG TTGGACCGGA TCCAGAGGAA 

20 EPGT PPS SPL SAEQ L D R IQR 

3 01 CAAGGCCGCG GCCCTGCTCA GACTCGCGGC CCGCAACGTG CCCGTGGGCT TTGGAGAGAG 

NKAA ALL RLA ARNV PVG FGE 

25 361 CTGGAAGAAG CACCTCAGCG GGGAGTTCGG GAAACCGTAT TTTATCAAGC TAATGGGATT 

SWKK HLS GEF GKPY FIK LMG 

421 TGTTGCAGAA GAAAGAAAGC ATTACACTGT TTATCCACCC CCACACCAAG TCTTCACCTG 

FVAE ERK HYT VYPP PHQ VFT 


4 81 GACCCAGATG TGTGACATAA AAGATGTGAA GGTTGTCATC CTGGGACAGG ATCCATATCA 
WTQM CDI KDV KVVI LGQ DPY 


541 TGGACCTAAT CAAGCTCACG GGCTCTGCTT TAGTGTTCAA AGGCCTGTTC CGCCTCCGCC 
35 HGPN QAH GLC FSVQ RPV PPP 

601 CAGTTTGGAG AACATTTATA AAGAGTTGTC TACAGACATA GAGGATTTTG TTCATCCTGG 
PSLE NIY KEL STDI EDF VHP 

40 661 CCATGGAGAT TT AT CTGGGT GGGCCAAGCA AGGTGTTCTC CTTCTCAACG CTGTCCTCAC 

GHGD LSG WAK QGVL LLN AVL 

721 GGTTCGTGCC CATCAAGCCA ACTCTCATAA GGAGCGAGGC TGGGAGCAGT TCACTGATGC 
TVRA HQA NSH KERG WEQ FTD 


781 AGTTGTGTCC TGGCTAAATC AGAACTCGAA TGGCCTTGTT TTCTTGCTCT GGGGCTCTTA 
AVVS WLN QNS NGLV FLL WGS 


841 TGCTCAGAAG AAGGGCAGTG CCATTGATAG GAAGCGGCAC CATGTACTAC AGACGGCTCA 
50 YAQK KGS AID RKRH HVL Q T A 
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9 01 TCCCTCCCCT TTGTCAGTGT ATAGAGGGTT CTTTGGATGT AGACACTTTT CAAAGACCAA 
HPSP L S V YRG FFGC RHF SKT 

961 TGAGCTGCTG CAGAAGTCTG GC AAGAAGCC CATTGACTGG AAGGAGCTGT GATCATCAGC 
5 NELL QKSGKK PIDW KEL 

1021 TGAGGGGTGG CCTTTGAGAA GCTGCTGTTA ACGTATTTGC CAGTTACGAA GTTCCACTGA 

1081 AAATTTTCCT ATTAATTCTT AAGTACTCTG CATAAGGGGG AAAAGCTTCC AGAAAGCAGC 

10 

1141 CATGAACCAG GCTGTCCAGG AATGGCAGCT GTATCCAACC ACAAACAACA AAGGCTACCC 

1201 TTTGACCAAA TGTCTTTCTC TGCAACATGG CTTCGGCCTA AAATATGCAG AAGACAGATG 

15 12 61 AGGTCAAATA CTCAGTTGGC TCT CTTTATC TCCCTTGCCT TTATGGTGAA ACAGGGGAGA 

1321 TGTGCACCTT TCAGGCACAG CCCTAGTTTG GCGCCTGCTG CTCCTTGGTT TTGCCTGGTT 

13 81 AGACTTTCAG TGACAGATGT TGGGGTGTTT TTGCTTAGAA AGGTCCCCTT GTCTCAGCCT 

20 

1441 TGCAGGGCAG GCATGCCAGT CTCTGCCAGT TCCACTGCCC CCTTGATCTT TGAAGGAGTC 

1501 CTCAGGCCCC TCGCAGCATA AGGATGTTTT GCAACTTTCC AGAATCTGGC CCAGAAATTA 

25 1561 GGG CTCAATT TCCTGATTGT AGTAGAGGTT AAGATTGCTG TG AG CTTT AT CAGATAAGAG 

1621 ACCGAGAGAA GTAAGCTGGG TCTTGTTATT CCTTGGGTGT TGGTGGAATA AGCAGTGGAA 

1681 TTTGAACAAG GAAGAGGAGA AAAGGGAATT TTGTCTTTAT GGGGTGGGGT GATTTTCTCC 

30 

1741 TAGGGTTATG TCCAGTTGGG GTTTTTAAGG C AG CAC AG AC TGCCAAGTAC TGTTTTTTTT 

18 01 AACCGACTGA AATCACTTTG GGATATTTTT TCCTGCAACA CTGGAAAGTT TTAGTTTTTT 

35 18 61 AAGAAGTACT CATGCAGATA TATATATATA TATTTTTCCC AGTCCTTTTT TTAAGAGACG 

1921 GTCTTTATTG GGTCTGCACC TCCATCCTTG ATCTTGTTAG CAATGCTGTT TTTGCTGTTA 

1981 GTCGGGTTAG AGTTGGCTCT ACGCGAGGTT TGTTAATAAA AGTTTGTTAA AAGTTCAAAA 

40 

2041 AAAAAAAAAA AAA 

or a fragment thereof encoding a catalytically active 
product comprising at least nucleotides 121 to 130, 

45 preferably 71 to 202 in addition to the catalytic 

domain, or a sequence which is degenerate, substantially 
homologous with or which hybridizes with at least 
nucleotides 121 to 130, preferably 71 to 202 of any such 
aforesaid sequence. Preferably such degeneracy, 

50 homology or hybridization applies to the entire 
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sequence . 

"Catalytically active product" as used herein refers 
to any product encoded by said sequence which exhibits 
uracil DNA glycosylase activity. 
5 "Substantially homologous" as used herein includes 

those sequences having a sequence homology of 
approximately 60% or more, eg. 70% or 80% or more, and 
also functionally-equivalent allelic variants and 
related sequences modified by single or multiple base 
10 substitution, addition and/or deletion. By 

"functionally equivalent" in this sense is meant 
nucleotide sequences which encode catalytically active 
polypeptides, ie. having uracil DNA glycosylase 
activity. 

15 Sequences which "hybridize" are those sequences 

binding under non- stringent conditions (eg. 6 x SSC 50% 
formamide at room temperature) and washed under 
conditions of low stringency (eg. 2 x SSC, room 
temperature, more preferably 2 x SSC, 42°C) or 

20 conditions of higher stringency (eg. 2 x SSC, 65°C) 

(where SSC = 0 . 15M NaCI, 0.015M sodium citrate, pH 7.2). 
Generally speaking, sequences which hybridize under 
conditions of high stringency are included within the 
scope of the invention, as are sequences which, but for 

25 the degeneracy of the code, would hybridize under high 
stringency conditions. 

The significance of the UNG1, UNG2 presequences has 
also been investigated in the present invention, by the 
use of constructs that express fusion products of UNGl 

30 or UNG2 and green fluorescent protein (EGFP) . 

Surprisingly, significant effects on subcellular 
targeting were observed and after transient transfection 
of HeLa cells, the pUNGl-EGFP-Nl product co- localized 
with mitochondria whereas the pUNG2-EGFP-Nl product 

35 targeted exclusively to nuclei. Whilst not wishing to 
be bound by theory, it appears that these sequences may 
be instrumental in the localization of the enzymes. The 
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putative nuclear signal was identified as RKRH which 
also appears in the catalytic domain of both UNG1 and 
UNG2 . Whilst it was recognized previously by Slupphaug 
et al., 1993, Nucl . Acids Res., 21(11), p2579-2584, that 
5 the signal for mitochondrial translocation resides in 
the UNG1 presequence, it was believed that the signal 
for nuclear import lay within the mature protein as in 
the absence of the presequence, UNG1 was transported to 
the nucleus. However, UNG2 has now been identified 

10 which has a presequence and which localizes to the 
nucleus . These presequences thus have utility for 
directing the subcellular localization of molecules 
attached to them. 

Thus, viewed from a further aspect, the invention 

15 provides nuclear localization peptides encoded by a 

nucleic acid molecule comprising the sequence (SEQUENCE 
I.D. Nos 3 and 4) : 

ATGATCGGCC AGAAGACGCT CTACTCCTTT TTCTCCCCCA GCCCCGCCAG 
20 MIG QKTLYSF FSP SPA 

GAAGCGACAC GCCCCCAGCC CCGAGCCGGC CGTCCAGGGG ACCGGCGTGG CTGGGGTGCC 
RKRH APS PEP AVQG TGV A G V 

25 TGAGGAAAGC GGAGATGCGG CG 

PEES GDA A 

or a fragment thereof encoding a functional equivalent 
3 0 or a sequence which is degenerate, substantially 

homologous with or which hybridizes with any such 

aforesaid sequence. 

Functionally equivalent fragments refer to products 

which may serve as appropriate localization peptides. 
35 Especially preferred nuclear localizing peptides are 

those which include the amino acid sequence RKRH. 
A further preferred feature of the invention 

comprises DNA glycosylases of the invention which 

additionally comprise at least one of the aforesaid 
40 nuclear localization peptide sequences or at least one 
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10 


mitochondrial localization peptide sequence encoded by a 
nucleic acid molecule comprising the sequence (SEQUENCE 
I.D. Nos 5 and 6) : 


ATGGGCGTCT TCTGCCTTGG GCCGTGGGGG TTGGGCCGGA AGCTGCGGAC GCCTGGGAAG 
MGV FCL GPWG LGR KLR TPGK 

GGGCCGCTGC AGCTCTTGAG CCGCCTCTGC GGGGACCACT TGCAG 
GPL QLL SRLC GDH L Q 


or a fragment thereof encoding a functional equivalent 
or a sequence which is degenerate, substantially 
homologous with or which hybridizes with any such 
aforesaid sequence, e.g. CDG or TDG with a localization 

15 peptide. Such a composite may be prepared for example 
by appropriate modification of UNG1 or UNG2 . 

The novel DNA glycosylases of the invention 
conveniently may be obtained by modification of existing 
DNA glycosylase enzymes, such as the human UDG mentioned 

20 above. Such modification, for example by replacement, 

addition or deletion of one or more amino acid residues, 
or indeed chemical modification of amino acid residues, 
may readily be achieved using methods well known in the 
art and include modifications both at the protein level 

25 and also at the level of the encoding nucleic acid. For 
example, site-directed mutagenesis techniques are widely 
described in the literature. Other conventional 
mutagenesis treatments which may be used to obtain 
enzymes according to the invention include random or 

30 regional random mutagenesis by chemical agents, such as 
N-nitroso compounds, or physical agents, such as 
ultraviolet light, as well as random or regional random 
mutagenesis by polymerase chain reaction (PCR) methods. 
Regional random mutagenesis may be carried out by 

35 subcloning one or more relevant DNA sequences encoding 
segments of the starting protein e.g. UDG, followed by 
random mutagenesis on this fragment or fragments. After 
the fragments have been mutagenized they may be 
reinserted into a DNA sequence encoding the starting 
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protein e.g. UDG. Screening of individual colonies for 
novel DNA glycosylases of the invention may then be 
performed using assay methods described herein. 

Alternatively, the novel DNA glycosylases of the 
5 invention may be obtained by other techniques, for 

example polypeptide synthesis, construction of fusion 
proteins etc. 

DNA glycosylase activity may readily be assayed \ 
according to techniques well known in the art, see for 

10 example Slupphaug et al . (1995) Biochemistry, 34: 128- 
13 8, and Nedderman & Jiricny, supra. Assays for DNA 
glycosylase may be used for identifying enzymes 
according to the invention. The enzymes may be 
naturally occurring or formed as the result of 

15 manipulations of naturally occurring gene sequences or 

products. Thus, for example, a cell-free extract may be 
assayed using a thymine or cytosine-containing substrate 
to identify enzymes which perform excision of one or 
more of the bases. For the purposes of assessment, the 

20 cytosine and thymine bases in the substrates are 
conveniently labelled, for example fluorescent or 
radiolabelled e.g. with 3 H. Suitable substrates may be 
prepared by methods known in the art e.g. by nick 
translation, random priming, PCR or chemical synthesis. 

25 To ascertain if the enzymes are also capable of excising 
uracil, substrates including uracil may also be used. 
Conveniently, the uracil bases should be labelled to 
allow detection. Assays for the excision of different 
bases are preferably performed independently. 

3 0 Thus, viewed from a yet further aspect, the 

invention provides an assay for the identification of 
DNA glycosylases of the invention in a sample, in which 
said assay comprises at least the step of assaying for 
activity in the sample which is capable of excising 

35 thymine or cytosine and optionally also uracil from an 
introduced ssDNA and/or dsDNA substrate. Optionally, 
the moiety responsible for such activity may be 
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isolated. Suitable assays are described herein and are 
also known in the art. 

DNA glycosylases of the invention include 
modifications of human UDG by amino acid replacement, as 
5 mentioned above, especially at positions 204 and 147. 
Such amino acid- substituted mutants of human UDG may 
also comprise additional modifications, for example 
truncation from the N- and/or C- terminal, or chemical 
derivation of amino acid residues and/or addition, 

10 deletion or mutation of constituent residues which do 
not affect the overall specificity of the enzyme. 

Derivatives of UDG or other DNA glycosylase enzymes 
from other genera or species, having the CDG or TDG 
functional activity mentioned above, are also included 

15 within the scope of the invention. It will be 

appreciated that appropriate modification of such 
enzymes would be performed on comparable residues to 
those in the human enzyme which form part of the active 
site and which could be identified by methods known in 

20 the art, e.g. by sequence comparison to human UDG and/or 
by mutation of residues which are identified as 
potentially conferring specificity to the enzyme and 
subsequent substrate specificity analyses of the mutant 
enzymes thus obtained. 

25 The novel DNA glycosylases of the invention may have 

a number of uses, for example as tools in molecular 
biology procedures, most notably in mutagenesis, both in 
vitro and in vivo , but also in other areas such as cell 
killing, removal of contaminating DNA, random 

30 degradation of DNA, enzymatic DNA sequencing etc. 

In light of the identification of mitochondrial and 
nuclear localizing peptides, it is now possible to 
direct human uracil -DNA glycosylase either to nuclei or 
to mitochondria by making constructs containing either a 

35 nuclear localization signal, such as in UNG2 , or a 

mitochondrial localization signal, such as in UNG1, as 
mentioned above. Whilst this alone may be used to 
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mutate RNA in the cells, this is particularly useful in 
combination with site directed mutations that give rise 
to mutants that have either TDG activity or CDG activity 
because it allows for selective mutagenesis of nuclear 
5 DNA or mitochondrial DNA. Furthermore, it is useful in 
a system where either nuclear or mitochondrial DNA is 
the target for degradation for the purpose of killing 
cells, eg. cancer cells. 

As mentioned above, DNA glycosylases according to 

10 the invention may be used in a mutagenesis system both 
in vitro and in vivo . These proteins have numerous 
advantages over typical chemical mutagens, particularly 
regarding their ease of use. Small molecular mutagens, 
such as methylnitrosurea (MNU) , methylmethanesulf onate 

15 (MMS) or methylnitrosoguanidine (MNNG) are very toxic on 
contact with eyes, skin or mucosal membranes and may 
decompose to explosive and volatile toxic compounds. 
Other mutagens, such as dimethylnitrosamine and 
benzo (a) pyrene require metabolic activation by special 

20 enzymes that are only present in some cells. They can 
therefore only be used under certain experimental 
conditions and will often require the addition of a 
fraction containing activating enzymes. All these 
chemical mutagens therefore require specialised 

25 precautions in order to protect the user. One major 

advantage of DNA glycosylases according to the invention 
is that they are not volatile and are not harmful to the 
user, for example, by mere skin contact. 

Mutagenesis in vitro may be performed on a complex 

30 sample, e.g. a cell-free extract, a partially refined 

sample, e.g. nucleic-acid enriched or purified sample or 
on a single population of nucleic acid material, e.g. 
amplified nucleic acid material. Random mutation may be 
performed using selected DNA glycosylases of the 

35 invention (possibly in combination with one another 
and/or with known DNA glycosylases) , to release 
particular bases or combinations of bases from the 
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nucleic acid substrate. Removal of the resulting abasic 
site and replacement of the removed base with another 
base may be performed by provision of appropriate 
enzymes and bases. 
5 Specific mutagenesis may be performed in a number of 

ways. Depending on the specificity of the DNA 
glycosylase for ssDNA or dsDNA, either one or the other 
type of DNA may be targeted. One application of such a 
method may be to introduce labelled bases into the 
10 target DNA to identify its presence or amount in the 
total nucleic acid material . Alternatively, the 
substrate which is uniquely recognizable (e.g. dsDNA) 
may be made sensitive to digestion or degradation after 
release of the appropriate base by DNA glycosylase 
15 activity when replacement of the base has not been 

performed. This may then be used to remove certain ss- 
or ds-DNA from a sample . Such an application is 
discussed in more detail hereinafter. 

Another application involves the introduction of 
20 selected bases after release of the specific bases 
recognized by the DNA glycosylase. In this way, 
replacement of specific bases by specific other bases 
may be performed. It is known from the art that the 
human UDG has sequence specificity for uracil excision 
25 in the sequence surrounding the uracil base (Slupphaug 
et al., 1995, supra). Appropriate selection of enzyme 
concentrations and other determinants may be employed to 
excise specific bases from known sequences or 
alternatively, by replacement with appropriately 
30 labelled bases, to determine the presence of such 
sequences in nucleic acid samples . 

For mutagenesis in vivo , e.g. in a cell, a 
nucleotide sequence encoding a DNA glycosylase according 
to the invention under the control of an suitable 
35 expression vector may be introduced into the cell by any 
suitable means, for example, by transformation or 
through the use of liposomes. 
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A further aspect of the present invention thus 
provides a nucleic acid molecule comprising a nucleotide 
sequence which encodes a DNA glycosylase and/or nuclear 
localizing peptide of the invention as defined above. 
5 Such nucleic acid molecules may readily be prepared 
using conventional techniques well known in the art. 
Thus, for example, as already mentioned above, known 
gene sequences coding for DNA glycosylases , e.g. the UNG 
gene mentioned above, may be modified e.g. by nucleic 

10 acid substitution using standard techniques such as 
site-directed mutagenesis. 

In further aspects the invention also provides an 
expression vector containing a nucleic acid molecule of 
the invention, and transformed or transfected host cells 

15 carrying a nucleic acid molecule of the invention. 

The expression vector may be any conventional 
expression vector known in the art or described in the 
literature, including both phage and plasmid vectors. 
In general , these will comprise suitable regulatory 

20 sequences e.g. a promoter and/or enhancer operably 

connected to a gene expressing the enzyme. Suitable 
promoters include SV40 early or late promoter, e.g. PSVL 
vector, cytomegalovirus (CMV) promoter and mouse mammary 
tumour virus long terminal repeat, although preferably 

25 inducible promoters are used, e.g. mouse metallothionein 
I promoter. The vector preferably includes a suitable 
marker such as a gene for dihydrof olate reductase or 
glutamine synthetase . The expression vector may for 
example be an inducible vector, such as the E. Coli 

30 vector pTrc99A (See Slupphaug (supra)) inducible with 
isopropyl {J-D-thiogalactopyranoside (IPTG) . Other 
suitable expression vectors include any vector carrying 
an inducible promoter, such as lac, or bacteriophage 
lambda X P L/ in which the promoter is under the control 

35 of a temperature sensitive repressor (cJ) . Examples of 
such vectors are pKK223-2 and pP L _Lambda Inducible (from 
Pharmacia) . The DNA glycosylases of the invention may 
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also be expressed as fusion proteins. The expression of 
such fusion proteins may facilitate purification e.g. by- 
using a system such as the GST-gene fusion systems, 
exemplified by the pGEX vector systems (Pharmacia) or 
5 the fusion proteins with peptide sequences that are 

recognized by specific antibodies, exemplified by the 
FLAG Expression vectors (Kodak) . 

The host cell may likewise be any suitable host cell 
known in the art, including both eukaryotic e.g. yeast, 
10 mammalian and plant cells, and prokaryotic cells, e.g. 
bacteria, 

Transfection and transformation techniques are also 
well known in the art as described for example in 
Sambrook et al . (1989), Molecular cloning : A laboratory 

15 manual, 2nd Ed,, Cold Spring Harbor Laboratory Press, 

Cold Spring harbor, N, Y. ) as indeed are other techniques 
for introducing nucleic acids into cells, for example 
using calcium phosphate, DEAE dextran, polybrene, 
protoplast fusion, liposomes, electroporation, direct 

20 microinjection, gene cannon etc. 

Expression of the DNA glycosylase according to the 
invention results in the release of C or T from the 
cellular DNA, which may lead to transition mutations 
upon replication. 

25 Mutagenesis of cells, e.g. mammalian cells, may also 

be performed by introduction of the DNA glycosylase 
protein of the invention into the cell. This may be 
performed using for example liposomes or other 
appropriate techniques known in the art . 

30 TDG or CDG may also be used to specifically induce 

mutations either in the cell nucleus or mitochondria of 
eukaryotic cells. This may be carried out by expressing 
cDNA with the complete open reading frame of UNG2 , but 
with a site directed mutation in codon 204 (preferably 
35 Asn204Asp) or in codon 147 (preferably Tyrl47Ala) , in 
which the N- terminal amino acid sequence contains a 
nuclear localization signal, as described previously, to 
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obtain mutations in the nuclear DNA, or by expressing a 
cDNA expressing the complete reading frame of UNG1, in 
which the N- terminal amino acid sequence contains a 
mitochondrial localization signal, as described 
5 previously, with similar site directed mutations to 
those mentioned above, to specifically obtain 
mitochondrial mutations. For this purpose any 
expression vector applicable to eukaryotic cells may be 
used, but preferably the vector system should be 

10 inducible. To introduce the expression vectors into the 
cells, any method for transfection my be used. 
Alternatively, the same proteins may be expressed and 
purified and then introduced into the cells by liposome 
technology or other appropriate techniques in the art as 

15 mentioned previously. 

Combined in vitro/in vivo mutagenesis may also be 
performed. For example, an isolated restriction 
fragment of interest (or possibly the whole plasmid) may 
be treated with limited amounts of cytosine-DNA 

20 glycosylase or thymine-DNA glycosylase. Subsequently, 

the treated fragment may be reinserted into a vector and 
transformed into E . coli cells (the cells may also be 
pre- treated with a DNA damaging agent to ensure an 
error-prone SOS -repair) . As a result of the 

25 mutagenicity of AP- sites, this should yield random 
mutations . 

The Examples below describe the induction of 
mutations in bacterial cells by the expression within 
such cells of a DNA glycosylase ie. a CDG or TDG 

10 according to the present invention. Expression of the 
DNA glycosylases of the invention in the transformed 
cells causes an increase in mutation frequencies. 
Similar results may be obtained with other cells. To 
enhance mutagenesis, strains may be used, including both 

15 prokaryotic and eukaryotic strains, which are defective 
in the repair of AP-sites or are otherwise 
hypermutatable e.g. bacterial mutants that are defective 
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in endonuclease IV or exonuclease III f or both, or other 
mutants that similarly enhance the yield of mutations. 

Thus, the use of one or more DNA glycosylases 
according to the invention in in vitro and/or in vivo 
5 mutagenesis systems provide yet further aspects of this 
invention. 

Another use of DNA glycosylases of the invention 
involves DNA modification. By treating any type of DNA 
(single or double-stranded) in vitro with a DNA 
10 glycosylase according to the invention, naturally- 
occurring C or T will be released, thus leaving an 
apyrimidic site (AP-site) . Subsequent treatment of this 
DNA with alkaline solutions or enzymes such as 
apurinic/apyrimidinic-site endonucleases (AP- 
IS endonucleases) recognising AP- sites will cause breaks in 
the DNA at the AP-sites. This method may therefore be 
used for the random cleavage of DNA, The number of 
cleaved sites will depend on the amount of the DNA 
glycosylases according to the invention used, thus 
20 allowing the number of AP-sites and hence breaks to be 

controlled. Uses of such methods include the removal of 
possible contaminating DNA prior to PCR amplif ication 
and for the enzymatic sequencing of DNA. The random 
cleavage of DNA can also be used for producing randomly 
25 fragmented DNA of defined size ranges for different 
purposes, for example for efficient hybridization of 
DNA, for preparing genomic libraries or for removal of 
high-molecular weight viscous DNA. 

One advantage of using a DNA glycosylase according 
30 to the invention in such methods is that in contrast to 
nucleases, DNA glycosylases do not require divalent 
cations and this is advantageous when buffers containing 
divalent cations are not desirable. A further advantage 
is that the DNA glycosylase may be inactivated by 
35 heating the reaction mixture to 80°C for 15 minutes, thus 
eliminating or substantially reducing its activity. 

Uracil -DNA glycosylase has previously been shown to 
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be efficient in removing contaminating DNA prior to PGR 
amplification. This method has the disadvantage that 
only DNA containing uracil could be removed and meant 
that uracil -containing DNA had to be prepared using 
5 appropriate uracil -containing primers to obtain DNA 
which could be removed prior to amplification. One 
advantage of the DNA glycosylases according to the 
p rese nt invention is that they do not have this 
requirement as any contaminating DNA would be likely to 

10 contain cytosine or thymine bases. Thus, CDG and/or TDG 
according to the invention may be added to a reaction 
mix and allowed to digest contaminating DNA. After 
treatment the enzymes/s are inactivated prior to the 
addition of the DNA sample and amplification to avoid 

15 degradation of the template or product. 

Thus a further aspect of the invention provides the 
use of one of more DNA glycosylases according to the 
present invention for removing contaminating DNA prior 
to PCR amplification. The use of one or more DNA 

20 glycosylases according to invention in DNA modification 
provides a further aspect of the invention. The term 
"modification" as used herein refers to all forms of 
modifying or manipulating DNA, including cleavage, base 
substitution or insertion etc. 

25 A DNA glycosylase according to the present invention 

may also be used in a method for the killing of cells. 
A DNA glycosylase according to the present invention may 
be introduced into specific target cells by means of 
known transformation techniques, liposomes, specific 

30 targeting systems such as ligands that bind to specific 
receptors, or any other suitable techniques. The DNA 
glycosylase may be expressed in a tissue- specif ic manner 
by placing a tissue- specif ic promoter upstream of the 
DNA sequence encoding a DNA glycosylase according to the 

3 5 present invention. Examples of such tissue- specif ic 
promoters are well known and are for example found in 
genes for a number of liver specific proteins such as 
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albumin, blood clotting factors and apolipoproteins ; 
several hormones, such as human growth hormone from the 
pituitary gland and insulin from Langerhans islands in 
pancreas, as well as aromatase involved in the estrogen 
5 biosynthetic pathway; porphobilinogen deaminase which is 
the third enzyme in the heme biosynthetic pathway; 
glycoprotein Ilb/IIIa which is expressed in maturing 
megakaryocytes; the Zeta subunit of T-cell antigen 
receptor (TCR) which is expressed in T- cells; CD14 

10 expressed in monocytes and macrophages; villin expressed 
in certain epithelial tissues and tyrosinase expressed 
in melanocytes and melanomas. In some cases abnormal 
expression from tissue specific promoters has been 
observed in tumour cells, and this may be exploited by 

15 using constructs of novel DNA glycosylases and the 
relevant tissue specific promoter • 

When the DNA glycosylase is expressed it may 
fragment the DNA in the cell and therefore kill the 
cell. Specific cells may also be targeted through the 

20 use of promoters containing other control elements, for 
example, promoters which are controlled in a cell -cycle 
or temporal manner or those possessing regulatory 
elements responsive to internal or external factors, 
e.g. promoters activatable by specific inducers, e.g. 

25 the inducer IPTG, which induces the lac promoter or lac 
derivatives such as trc, by certain metals (e.g. 
metallothionein promoter) , by certain hormones such as 
dexamethasone, androgens (on for example the promoter of 
the gene for prostate specific antigen which is tissue 

30 specific), retinic acid and certain cytokines. 

Conceivably, where enzymes of the invention exhibit 
specific substrate requirements in the sequence 
surrounding the base for excision, this specificity may 
be employed by appropriate low level expression of the 

35 DNA glycosylase such that only DNA with the specific 
sequence is made susceptible to degradation. 

Thus a further aspect of the invention provides a 
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method of killing cells, comprising the steps of 
introducing a DNA glycosylase according to the present 
invention into a cell and expressing said DNA 
glycosylase in the cell to an extent which results in 
5 the killing of that cell. Preferably ( the DNA 

glycosylase according to the present invention is 
contained within an expression vector, most preferably, 
a tissue-specific expression vector. 

A further use of DNA glycosylases of the invention 

10 is for performing enzymatic DNA sequencing. This may be 
performed in a manner analogous to the chemical 
sequencing method of Maxam and Gilbert (Maxam and 
Gilbert (1980) Methods in Enzymology, 65: 499). 
However, the Maxam-Gilbert procedure involves the use of 

15 several very toxic chemicals, such as dimethylsulf ate 
(DMA) and hydrazine (the latter is also explosive) and 
use of the glycosylases of the invention present a 
considerable advantage. Enzymatic sequencing may be 
performed for example by end-labelling the sample DNA 

20 fragment appropriately, for example with 32 P, 33 P or 35 S . 

For identifying the positions of cytosines and thymines 
in the DNA, the DNA is treated with limiting amounts of 
cytosine-DNA glycosylase and thymine-DNA glycosylase 
according to the invention, respectively. The resulting 

25 AP-sites are then cleaved, e.g. by alkaline solution 

(pyridine) or by an AP-endonuclease . The resulting end- 
labelled fragments are subsequently separated e.g. by 
electrophoresis and the position of fragments of varying 
length identified appropriately, e.g. by 

3 0 autoradiography. Ideally, the positions of adenines and 
guanines should be determined in the same way using 
adenine- or guanine-DNA glycosylases. At the present 
time such enzymes are not available. However, the £L_ 
coli DNA repair enzymes Tag and AlkA recognize adenine 

35 alkylated in the 3 -position (Tag, AlkA) and guanine 

alkylated in the 3 -position (AlkA) . Thus, one way of 
determining the positions of adenines and guanines may 
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be after alkylation of DNA with DMS, followed by 
treatment with AlkA and Tag. Subsequent experimental 
procedure may be performed as for determining the C and 
T positions. 

5 Thus, a further aspect of the invention provides a 

method of performing enzymatic DNA sequencing to 
determine the position of cytosine and/or thymine bases 
by treating said DNA with at least one CDG and/or TDG of 
the invent ion . 
10 The invention will now be described more 

specifically in the following non- limiting Examples with 
reference to the following drawings in which: 

FIGURE 1 comprises graphs showing in vitro excision of 
15 radiolabelled material from double stranded (ds) or 

single stranded (ss) [ 3 H] cytosine-labelled DNA substrate 
(C-substrate) and [ 3 H] thymine- labelled DNA substrate (T- 
substrate) by human UDG- mutants (CDG : Panel A, TDG : 
Panel B) . The data represent mean values from two 
2 0 independent experiments each in duplicate for each time 
point. Symbols in panel B of Figure 1 are as indicated 
in panel A; 

FIGURE 2 comprises graphs showing analysis of the 
25 radioactive excision products of substrate DNA by UDG 
(panel A) and UDG mutants TDG (Panel B) and CDG (Panel 
C) , performed by thin layer chromatography. U-substrate 
is indicated by stars (*) , other symbols are as in 
Figure l. The migration of unlabelled standards (the 
30 free bases uracil, cytosine or thymine) is indicated as 
rectangles, marked respectively U-marker, C-marker and 
T-marker over the relevant fraction numbers; 

FIGURE 3 shows a revised organisation of the human UNG 
35 gene. The restriction maps with EcoRl, Hindlll, Sad 

and Xbal are indicated. Exons are shown as black boxes 
and are numbered by Roman numbers. Exon 1A is a 
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previously unrecognised exon. Interspersed repeats are 
indicated (-: Alu, •: MER, ♦: MIR, *: position of a 300 
bp TA dinucleotide repeat) ; 

5 FIGURE 4 shows the generation of human UNG1 and UNG2 by- 
transcription from two promoters and alternative 
splicing. P2 is the previously recognised promoter for 
transcription of UNG1 (Haug et al . , 1994, FEBS Letters, 
353, pl80-184) and PI the promoter from which UNG2 is 

10 transcribed. Exon 1A encodes 44 amino acids present in 
UNG2, but absent in UNG1 . The 35 N- terminal codons of 
exon IB are only present in UNG1 . The presequence of 
UNG2 is shown on top with the putative nuclear 
localization signal underlined. The presequence of UNG1 

15 directing mitochondrial import is shown in the bottom 
line; 

FIGURE 5 shows the structure of the 5 '-terminal part of 
the human UNG gene (SEQUENCE I.D. No. 7). Bold letters 
20 indicate exons (1A and IB) ; 

FIGURE 6 shows the alignment of UNG proteins from man 
and mouse (SEQUENCE ID Nos 8 (hUNGl) , 9 (mUNGl) , 2 
(hUNG2) and 10 (mUNG2) ) . Note that UNG1 and UNG2 

25 proteins have been aligned separately down to the common 
splice corresponding to codon 44 in human UNG2 . The 
presequence not present in the catalytically active form 
of human placental uracil -DNA glycosylase originally 
isolated, residues 1-77 in human UNG1 (Wittwer et al . , 

30 1989, Biochemistry, 28, p780-784) is shown in bold 

letters. Downstream of the alternative splice site (4) 
used for generating UNG2 forms (from 45 in human UNG2) , 
the sequences of the two forms are identical in each 
species. Residues that make up walls of the uracil - 

35 binding pocket or which are directly involved in 

catalysis are marked with a star (*) . Residues that are 
involved in DNA-binding (except those involved in 
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uracil -binding) are marked with a triangle (▼) ; and 

FIGURE 7 shows the subcellular localization in HeLa 
cells of UNG2-EGFP-N1 and UNG1-EGFP-N1 fusion products. 
5 HeLa cells were transfected with constructs expressing 
pUNG2 - EGFB -Nl (C) , pUNGl-EGFP-Nl (D) or the control 
pEGFP-Nl (A) , all expressed from the CMV promoter, and 
processed for confocal microscopy. Panel B shows 
staining of mitochondria with Texas red. 

10 

Example 1 

Site direc ted mutagenesis of human UDG codons 
Site directed mutagenesis was performed on the relevant 
codons in human UDG and the proteins expressed in 
15 Escherichia coli. 


Methods 

Site-directed mutagenesis was carried out as in Mol et 
al . , 1995, supra. To obtain the Tyrl47Ala mutant, codon 

20 147, TAT->Tyr, was changed to GCT-^Ala, and to obtain the 
Asn204Asp mutant, codon 204, AAO+Asn, was changed to 
GAO+Asp. Mutated DNA fragments were subcloned into 
human UDG expression construct pTUNGA84 by replacing 
restriction fragments in the expression construct by 

2 5 fragments containing the respective mutations. In 

Escherichia coli pTUNGA84 expresses high levels of a 
fully active human UDG (UNGA84) lacking 7 non-essential 
and non- conserved NH 2 - terminal residues of the mature 
form of UDG (Mol et al . , 1995, supra; Slupphaug et al . , 

30 1995, supra) . Expression of mutant proteins in 

Escherichia coli and purification of the mutant proteins 
to apparent homogeneity were carried out as described 
previously (Mol et al. , 1995, supra; Slupphaug et al . , 
1995, supra) . Relevant fractions were assayed for DNA 

35 glycosylase activity during each step in the 

purification. As a result of high expression, 
purification may also take advantage of the UV 
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absorption of the enzymes. Peaks of UV absorption 
corresponding to the enzyme of interest could already be 
observed after only the first two column steps. 

5 To test enzymatic substrate specificities, 250 ng 

purified human "wild type" UDG (UNGA84 ) , UNGA84Tyrl4 7Ala 
or UNGA84Asn204Asp, were mixed with 200 ng ds- or ss- 
[ 3 H] cytosine-labelled DNA (150mCi/mmol) , or ds- or ss- 
[ 3 H] thymine- labelled DNA (lOOmCi/mmol) in 10 mM NaCl, 20 

10 mM Tris-HCl (pH 7,5), 1 mM EDTA, 1 mM dithiothreitol and 
0.5 mg/ml bovine serum albumin (final concentrations) in 
20 a*1 separate reactions. The final concentrations of 
the [ 3 H] cytosine-DNA (C) and [ 3 H] thymine -DNA (T) 
substrates were 6 . 5 jjM and 10 /iM, respectively. Release 

15 of radioactivity as a function of time was measured at 
37 D C. These conditions are later referred to as 
standard conditions. Substrate synthesis and processing 
of samples for scintillation counting were as described 
in Krokan and Wittwer (1981) Nucl . Acids Res., 9: 2599- 

20 2613. Single- stranded substrate was generated by 

boiling double-stranded substrate for 10 min, followed 
by rapid cooling on ice. 

Results 

25 Figure 1 demonstrates time -dependent release of acid- 
soluble radioactivity by homogeneous UNGA84Asn204Asp 
(CDG) from [ 3 H] cytosine-labelled DNA, but not from 
[ 3 H] thymine -label led DNA. Conversely, the homogeneous 
UNGA84Tyrl47Ala (TDG) mutant releases acid- soluble 

30 radiolabelled material from [ 3 H] thymine-labelled DNA f but 
not from [ 3 H] cytosine-labelled DNA. 

An^ysiR of the radioactive excision products bv thin 
35 layer chromatography 

Methods 

The analysis was performed using DC-cellulose as the 


WO 97/25416 


PCT/GB97/00057 


stationary phase and methanol :HC1 :H 2 0 - 70:20:10 as the 
mobile phase. Samples were prepared as follows: 1.5 jug 
enzyme (UNGA84, UNGA84Tyrl47Ala or UNGA84Asn204Asp as 
prepared in Example 1) was incubated with 1 v<3 
5 [ 3 H] uracil-labelled DNA (SOOmCi/mmol) , [ 3 H] cytosine- 

labelled DNA (150mCi/mmol) or [ 3 H] thymine -label led DNA 
(lOOmCi/mmol) in separate 50 ^1 reactions under standard 
buffer conditions (see Example 1) for 1 hour. 
Macromolecules in the samples were then ethanol 

10 precipitated, the supernatants after centrif ugation were 
collected, ethanol was removed by evaporation and the 
remaining material was dissolved in 10 pi H 2 0. 1m1 was 
spotted on the membrane. After migration the cellulose 
sheet was cut in strips and radioactivity measured by 

15 scintillation counting in Ready Protein scintillation 
cocktail . 

Results 

Separation of the acid- soluble radioactive material by 
20 thin layer chromatography (Figure 2) demonstrated that 

the released material was the free bases [ 3 H]cytosine or 
t 3 H] thymine. Separation using another mobile phase 
(butanol :H 2 0, 86:14) verified these results (data not 
shown) . In addition, both mutants release [ 3 H] uracil, 
25 whereas "wild type" UDG (UNGA84) releases [ 3 H] uracil 
only (Figure 2) . 

Eyftanplft 3. 

finhafcrate sp^rific ity and uracil inhibition and kinetic 
30 p-rnpgrti ^R of UDG-mutants 

Methods 

For measuring release of uracil from double- stranded (li- 
ds) or single- stranded (U-ss) DNA the various mutant 
enzymes (prepared as described in Example 1 and by 
35 analogous site-directed mutagenesis methods and 

identical expression and purification methods) were 
incubated with 200 ng ds- or ss- [ 3 H] dUMP-labelled DNA 
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<500mCi/mmol , 2/xM final concentration) in 20 /il separate 
reactions for 10 min under standard conditions as 
described in Example 1 . For measuring release of 
[ 3 H]cytosine or [ 3 H] thymine, assays were performed as 
5 described in Example 1 using an incubation time of 10 
min. Uracil inhibition was analysed by adding 5 mM 
uracil (final concentration) to a standard U-ds assay. 
0 activity indicates activity below detection limit (10 
pmol per mg protein per min) with 100 ng enzyme and 200 
10 ng DNA substrate at standard conditions. 

The kinetic parameters were determined using six 
different substrate concentrations to obtain the and 
Vmax values. Duplicate samples were incubated for 20 min 
15 using standard assay buffer conditions and substrates as 
specified. and were calculated using the 

computer program Enzpack, version 3,0 after the method 
of Wilkinson (1961) Biochem. J., 80: 324-332. K cat was 
calculated from V,^ assuming an M r = 25000. 

20 

Results 

The results are shown in Tables 1 and 2 . From Table l 
it can be seen that only the substitution Tyrl47Ala 
results in an enzyme which specifically excises thymine. 

25 Similarly, only the substitution Asn204Asp results in a 
mutant which excises cytosine. Both mutant enzymes 
exhibit activity on single or double -stranded DNA and 
are also able to excise uracil. From Table 2 it can be 
seen that the turnover numbers of CDG and TDG are lower 

30 than for "wild type" release of uracil. 

Discussion 

These results demonstrate the significance of Asn204 for 
specific binding of uracil -containing DNA and the 
35 significance of Tyrl47 side chain ring structure for 
preventing binding of thymine. 
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It is somewhat surprising that the novel CDG of the 
invention still recognizes uracil, considering the 
unfavourable proximity of the Asp carboxyl side chain 
and the 04 atom in uracil . However, it should be noted 
5 that the other oxygen atom of the Asp204 carboxyl side 
chain still may form H-bonds with N3 uracil and that 
Aspl45 main chain carbonyl as well as the amide-N of 
Aspl45 and Glnl44 also contribute to the specificity. 
In addition, the UDG activity remaining is very low 
10 (0.04-0.16%) compared with "wild type". CDG has a 10- 
fold increased preference for single stranded substrate, 
whereas TDG has a decreased preference (Figure 1 and 
Table 1) . 

15 It is evident that the turnover numbers (K cat ) of the 

novel enzymes releasing either cytosine or thymine, as 
well as residual UDG activities, are very low when 
compared with release of uracil by "wild type" UDG 
(Table 2) . However, the very high turnover number of 

20 UDG appears to be unique among DNA glycosylases and 
turnover numbers of other DNA glycosylases may be as 
low, or even lower than those of the engineered 
glycosylases CDG and TDG. Thus, a recent biochemical 
characterisation of recombinant N-methylpurine-DNA 

25 glycosylase from mouse gave K cat values of 0.8 min" 1 and 
0.2 min" 1 for excision of 3 -methyl adenine and 7- 
methylguanine respectively (Roy et al . (1994) 
Biochemistry, 33: 15131-15140). 

30 The Rscherirhia coli inducible 3 -methyl adenine -DNA 
glycosylase II (AlkA) is a DNA glycosylase that 
recognizes at least 6 different damaged bases, among 
these structurally different alkylated purines and 
pyrimidines . The turnover number for AlkA on the 

35 substrate 3 -methyl adenine -DNA is calculated to be 0.03 

min' 1 (Bjelland et al. (1994) J. Biol. Chem., 269: 30489- 
30495) . 
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The FPG protein (f ortnamido-pyrimidine-DNA glycosylase) 
also has a rather low turnover number. The K cat value on 
the imidazole ring-opened form of 7-methylguanine-DNA 
substrate is calculated to 1.4 min" 1 (Boiteux et al . 
5 (1990) J. Biol. Chem., 265: 3916-3922). A low rate of 
catalysis is also likely for the naturally occurring 
T(U) /G-mismatch-DNA glycosylase since band shifts can be 
demonstrated after mixing the enzyme with substrate 
(Sassanfar & Roberts (1990) J. Mol . Biol., 212: 79-96). 

10 

All of these DNA glycosylases recognize at least two 
different substrates, and in most cases several damaged 
pyrimidines or purines. Probably the very high turnover 
number of UDG reflects a high selectivity of substrate 
15 binding in a tight fitting active site, allowing rapid 
catalysis by this specialized enzyme. In contrast, the 
DNA glycosylases with a broader substrate specificity 
may bind substrate less accurately, and excise the base, 
more slowly . 

20 

TABLE 1 





pmol excised per min per mg protein 


Inhibition 








5mM Uracil 

25 

Mutant 

U-ds 

U-ss 

C-ds 

C-ss 

T-ds 

T-ss 

% 


•Wild type* 

4.7 xlO 7 

9.5 xlO 7 

0 

0 

0 

0 

80 


Glnl44Lcu 

3.4 xlO 4 

4.8 xlO 4 

0 

0 

0 

0 

25 


Aspl45Giu 

5.5 xlO 4 

8.5 xlO 4 

0 

0 

0 

0 

80 


Aspl45Asn 

1.4xl0 4 

1.1 xlO 4 

0 

0 

0 

0 

80 

30 

Tyrl47AU 

2.2 xlO 4 

2.2 xlO 4 

0 

0 

1.3 xlO 3 

7.5 xlO 2 

0 


Tyrl47Phe 

3.2 xlO 7 

6.3 xlO 7 

0 

0 

0 

0 

50 


Serl69AU 

3.1 xlO 6 

5.6 xlO 6 

0 

0 

0 

0 

80 


Asn204Asp 

1.7 xlO 4 

1.6 xlO 5 

3.0 xlO 2 

3.0 xlO 3 

0 

0 

0 


Asn204Gin 

1.5 xlO 6 

2.2 xlO 6 

0 

0 

0 

0 

70 

35 

His268Leu 

1.3x10* 

2.6 xlO 5 

0 

0 

0 

0 

75 
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Substrate 


5 C-ds C-ss T-ds T-ss U-ds U-ss 

Mutant (fiMXmin 1 ) OiM)(min') QiM) (min' 1 ) OiMXmin' 1 ) (fiM)(min* 1 ) (nMXrnin 1 ) 

*Wildtype» - - 0.10 2500 0.06 5150 

Tyrl47AJa 6.0 0.06 1.4 0.02 3.5 1.0 0.30 0.6 

10 Tyrl47Phe - 0.16 1225 0.10 2370 

Asn204Asp 35 0.12 5.3 0.39 - - 2.4 1.2 2.0 15 

Asn204GIn 0.40 66 0.23 89 


15 Example 4 

Effects of TDH and CPA activity on frfiqiifincv of 
rifampici n resistant: mutations in E. coli una* strain 
(NRB051) and E . coli una strain (NR8Q52) 
Methods 

20 An overnight culture of E. coli strains NR8051 and 
NR8052 (both recA* , provided by Tomas Kunkel of 
National Institute of Environmental Health, USA) 
containing plasmids pTrc99A, pTUNGA84, UNGA84Tyrl47Ala 
and UNGA84Asn204Asp were prepared as described in 

25 Example 1 and grown in LB-medium with ampicillin (100 
A*g/ml) at 30*C. The culture was then diluted 1:20 in 
fresh medium and cultured for 5 hours at 37°C. Induced 
culture contained 1 mM IPTG in the LB-medium. To 
determine the number of rifampicin resistant bacteria, 

30 100 /il of the culture were mixed with 3 ml top agarose 
and poured on LB plates containing 100 Atg/ml rifampicin 
and incubated overnight at 37*C. Colonies were counted 
and the number of rifampicin resistant colonies per 10 8 
viable cells was calculated. 

35 

Results 

The results are shown in Table 3 . These results 
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indicate that the expression of UDG does not cause an 
increase in mutation frequencies (plasmid pTUNGA84 
compared to parental pTrc99A) . In fact, human UDG 
complements coli ungT cells. This is clear from the 
5 reduction in mutation frequencies from 4.4 to 1.3 when 
UDG is present in induced cells. Uninduced cells are 
also protected as a result of promoter leakage. In 
contrast, the mutation frequencies of E, CPli ung + 
cells are increased by a factor of 8.6 and 39 when 
10 carrying plasmids encoding CDG and TOG, respectively, 
compared to host cells carrying the parental plasmid 
pTrc99A. This increases to approximately 8.9 and 94.4 
respectively, in induced ungr + cells. 

15 pi Rrussion 

Single amino acid substitutions transform the highly 
uracil- selective uracil-DNA glycosylase into less 
selective DNA glycosylases that attack normal 
pyrimidines and confer a mutator phenotype upon the 

20 cell, presumably because excess numbers of 
apyrimidinic- sites are formed. 

It may seem surprising that propagation of plasmids 
expressing CDG or TDG activity is at all possible, 

25 since they might be expected to kill the host cells. 

We believe that the relatively low turnover numbers and 
the low expression in the absence of inducer (IPTG) is 
sufficient to reduce the number of depyrimidinations to 
a level that the DNA repair system can cope with. 

30 Nevertheless, DNA degradation is detectable even in the 
absence of inducer and is strongly increased when IPTG 
is added (data not shown) . The survival of Escherichia 
coli recA + host cells carrying uninduced CDG or TDG- 
plasmid is equal to that of the parental cell carrying 

35 plasmid pTrc99A (data not shown) although mutation 

frequencies are increased by a factor of 8.6 and 39 for 
CDG and TDG, respectively as mentioned above. 
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Induction of COG or TDG by IPTG reduces survival of the 
Escherichia coli host cells (in both NR8051 and NR8052) 
to less than 50% and 10%, respectively, within 5 hrs . 
Thus, AP-site repair capacity is sufficient for repair 
5 of damage caused by expression of CDG or TDG due to 

"leakage 11 from the uninduced promoter. However, this 
repair is apparently not complete, or may be 
inaccurate, since the frequency of mutations leading to 
rifampicin resistance is significantly increased by 

10 induction with IPTG (Table 3) . The activity of TDG ±u 
vivo leads to a 10- fold higher mutation frequency in 
Escherichia coli than the in vivo CDG activity. This 
probably reflects the fact that TDG has a higher 
activity on dsDNA than CDG, as demonstrated by in vitro 

15 experiments with homogeneous enzyme (Figure 1) , and 

that the Km value for TDG on dsDNA is much lower than 
the ^ for CDG on dsDNA (Table 2) . We have observed 
that TDG and CDG are both highly cytotoxic in a recA* 
background ( Escherichia coli DH5oc) even without 

2 0 induction (data not shown) . It is likely that this 

cytotoxic effect is due to a lack of SOS- induction in 
recA" cells. The chemical nature of the SOS-inducing 
signal, or signals, is not fully known, and some DNA 
lesions may indirectly activate the SOS response by 

25 interfering with DNA replication (Sassanfar & Roberts, 
1990, supra) . If generation of AP-sites by TDG and CDG 
directly or indirectly triggers SOS -induct ion, this 
would increase cell survival, at the cost of error 
prone repair and a high yield of mutations . CDG and 

30 TDG should be very useful for exploring the biological 
consequences of AP-sites in DNA. 

The new DNA glycosylases that we have engineered are 
distinctly different from previously known 
35 glycosylases. The mismatch- specific thymine-DNA 

glycosylase previously reported also releases uracil 
(Sassanfar & Roberts, 1990, supra; Nedderman & Jiricny, 
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1993, supra), like the thymine-DNA glycosylase we have 
constructed. However while the naturally occurring 
thymine -DNA glycosylase has an absolute requirement for 
a mismatched U or T opposite of a G, the TDG we have 
5 engineered recognises T or U from T(U) :A matches, as 
well as from single stranded substrate. A DNA 
glycosylase recognizing unmodified cytosine had 
previously not been reported. 

10 The mutator phenotype caused by a single amino acid 

substitution is intriguing since it changes an enzyme 
from its normal role in mutation avoidance into a 
cytotoxic mutator protein. In the case of CDG this 
change is the result of a single A-K3 transition, which 

15 in vivo could be the result of several different 

events, such as deamination of A, 04-alkylation of T in 
the complementary strand, and replication errors. 
Since this mutation would be dominant, only one allele 
would need to be mutated to get a new phenotype. It is 

20 possible, however, that this mutation would be lethal, 
or that it would be without serious consequences due to 
efficient repair of DNA in mammalian cells. 
Nevertheless, the generation of repair enzymes having a 
dominant mutator effect that would give the cells a 

25 hypermutable phenotype may represent a new principle in 
mutagenesis . 

TABLE 3 


30 Frequency of rif R mutations per 10 B cells 


NR8051 NR8052 

Plaemid Uninduced Induced Uninduced Induced 

pTRC99A 0.8 ± 0.3 0.9±0.5 4.2 ± 1.2 4.4 ± 1.1 

35 pTUNGA84 0.7 ± 0.4 0.8 ± 0.3 0.9 ± 0.2 1.3 ± 0.4 

pTUNG A 8 4 Tyr 1 4 7 Al a 31 ± 8 85 ± 32 13 ± 5 57 ± 9 

pTUNGA84Asn204Asp 6.9 ± 4.1 8.0 ± 3.4 2.1 ± 0.7 4.1 ± 1.1 
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Example 5 

reffgrts of tdg acti vity on the frequency of rifampicin 
T-gfH grant mutations in E. coli strains BW527. and 
ration lumxiC) 

5 An overnight culture of E- coli strains BW52 7 (endoIV*) 
or GW2100 (umuC) provided by Erling Seeberg, The 
National Hospital, Oslo, containing plasmids pTrc99A or 
UNGA84Tyrl47Ala (TDG) were prepared as described in 
Example 1 and grown in LB-medium with ampicillin (100 
10 fig /ml) at 3 0°C. The culture was then diluted 1:20 in 
fresh medium and cultured as described in Example 4 . 

Results 

The results are shown in Table 4. These results 
indicate that the expression of UNGA84Tyrl47Ala (TDG) 
in 15 - coli strains BW527 (endoIV") or GW2100 (umuC") 
enhances the mutagenic effect of TDG compared to 
strains that do not carry these defects in the repair 
of AP-sites or defect in umuC especially after 
induction with IPTG. pTrc99A alone did not exert this 
effect to any significant extent. Even more 
importantly, the background mutations in these strains 
are low and the effects of induction with IPTG is high, 
thus improving the usefulness of UNGA84Tyrl47Ala (TDG) 
for mutagenesis when using more optimal strains. 

These results are especially surprising in light of 
previous findings that mutants in uniuC are generally 
difficult to mutate by some methods, for example by UV- 
30 light or by chemical challenge. 


15 


20 


25 
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TABLE 4 

Effects of TOG- activity on frequency of rifampicin 
resistant mutations in E. coli strains BW527 and GW2100 

5 

Frequency of rif R mutations per 10 B cells 


BW527 GW2100 
Plasmid Uninduced Induced Uninduced Induced 

10 pTRC99A 0.06±0.02 0.07±0.03 0 . 24xl0* 3 ±0 . lxlO* 3 6xlO- 3 ±5xlO° 

pTUNGA84Tyrl47Ala 1.20±0.2 240±122 0.65±3829 84±28 

Example 6 

Tsm^inn and characterisation of a nuclear form of 
15 uracil -D NA alycosylase 

Materials and Methods 

Materials 

20 Mouse embryonic carcinoma cDNA library, human liver 
cDNA library and NT2 neuronal precursor cell cDNA 
library were from Stratagene (La Jolla, CA, USA) . All 
libraries were propagated in the Uni-ZAP™XR vector 
using XL-1 blue as host. [a- 32 P]dCTP, [ 35 S] methionine, 

25 Rediprime random labelling kit and Hybond N+ filters 
were all from Amersham (UK) . All sequencing primers 
were from MedProbe (Oslo, Norway) . Dye terminator 
cycle sequencing ready reaction kit was from Applied 
Biosystems (Foster City, CA) . The Dynazyme PCR kit was 

30 purchased from Finnzymes Oy (Espoo, Finland) . TNT in 
vitro transcription/translation rabbit reticulocyte 
lysate system kit, pGEM-T TA cloning kit, Alter Sites 
II in vitro Mutagenesis System, primers for sequencing 
from T3 and T7 promoters and T3 RNA polymerase were 

35 from Promega (Madison, WI) . The plasmid encoding the 
red- shifted variant of green fluorescent protein 
(pEGFP-Nl) was from Clontech (Palo Alto, CA, USA) . 
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Restriction enzymes were from New England Biolabs Inc. 
(Beverly, MA, USA) . 

fir-r^nina of rDNA libraries 
5 All libraries were screened as recommended by the 

manufacturer, using 32 P-labelled UNG40 cDNA (Olsen et 
al., 1989, EMBO J., 8, p 3121-3125} as probe. 
Hybridization was carried out at 65°C overnight in 6 x 
SSC, 5 x Denhardt's solution and 0.1% SDS . Filters 
10 were washed in 0 . 1 x SSC/0.5 % SDS at 65°C and 

autoradiographed. Three rounds of screening were done. 
In vivo excision of pBluescript phagemids from the Uni- 
ZAP™XR vector was performed as recommended by the 
manufacturer . 

15 

figgnmee analysis of clones 

Sequencing was performed on an Applied Biosystems Model 
373A DNA Sequencing System using the Dye terminator 
cycle sequencing ready reaction kit as recommended by 
20 the manufacturer. The sequences were analysed using 
the Auto Assembler software (Applied Biosystems) . 

Tn vifrn transrripf inn, uracil -DNA alvCQSVlase aSSaVS 
25 and f.ran q l ftnf, transf ection Of HeLa cells tQV Promoter 

studies 

In vitro transcription/translation was performed with 
the TNT transcription/translation system with 
[ 35 S] methionine as recommended by the manufacturer, 

30 using 200 ng of the expression constructs per 10 fil 

reaction volume. The mouse UNG1 -pBluescript construct 
was transcribed from the T3 promoter in the pBluescript 
vector. The insert of mouse UNG2 -pBluescript was 
amplified by the polymerase chain reaction using 

35 Dynazyme PGR kit, ligated into the pGEM-T vector and 
transcribed from the T7 promoter. The human UNG2- 
pBluescript was transcribed from the T3 promoter after 
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Sacl/Nhel excision of a 79 bp fragment from the 
polylinker and the 5 f -end of cDNA for UNG2 . Human UNG1 
cDNA was transcribed from the T7 promoter as previously 
described (Slupphaug et al . , 1995, Biochemistry, 34, 
5 pl28-138) . The samples were run on a 12% denaturing 
sodium dodecyl sulfate polyacrylamide gel (SDS-PAGE) . 
The gel was dried, autoradiographed overnight and 
scanned on an LKB Ultroscan XL Enhanced Laser 
Densitometer. Uracil-DNA glycosylase activity was 

10 measured in parallel samples of the in vitro 

transcription/translation assay mixture containing 
unlabelled amino acids (Slupphaug et al . , 1995, supra). 
A construct containing both promoters (pGL2-ProAB) 
linked to the luciferase gene was prepared by insertion 

15 of a PvuII/MluI fragment (the enzymes cleave in 

positions 418 and 1035, respectively) from the promoter 
region of the UNG gene into the Smal-Mlul sites of 
pGL2-ProB. A promoter II-lucif erase construct (pGL2- 
ProB) and transient transfection with Transfectam 

20 (Promega) have been described previously (Haug et al . , 
1994, FEBS Letters, 353, p 180-184) . 

Prpp^ratAQn of dung- EGFP-Ni fusion constructs and 
localization studies 

25 UNG15 cDNA, which encodes UNG1, in pGEM7Zf+ (pUNG15) , 
(Slupphaug et al . , 1995, supra; Olsen et al . , 1989, 
supra) was digested with Bell, which cuts at bp 1019 in 
UNG 15 cDNA, blunted with DNA polymerase I, (Klenow 
fragment) , and ligated to an Agrel linker prepared from 

30 the oligonucleotide 5 1 -ACCGGTGCC-3 ' and its 

complementary copy. The religated pUNGIS containing 
the Agel linker correctly ligated into the Bell site 
(verified by sequencing) was digested with Rsrll, which 
cuts at bp 49 in UNG 15 cDNA (Olsen et al . , 1989, 

35 supra) , blunted as above and finally digested with 
Agrel. The fragment was then ligated into pEGFP-Nl 
digested with Smal (blunt) and Agel . The construct was 


WO 97/25416 


PCT/GB97/00057 


sequenced to verify that the construct was in frame 
with the ATG of the EGFP-N1 fusion protein. The TGA 
stop codon of pUNGIS was changed to GGA by site- 
directed mutagenesis performed according to the 
5 procedure provided by the manufacturer using ssDNA 

prepared with R408 phage. Potential pUNGloa A -EGFP-Nl 
constructs were screened by digestion with Sell 
(digests only unmutated plasmids) and verified by 
sequencing. The correct construct was named pUNGl- 

10 EGFP-N1. cDNA for UNG2 (this example) in pBluescript 
was digested with Nhel, which cuts 54 bp upstream of 
ATG, and EcoNI which cleaves the cDNAs in the sequence 
that is shared by cDNAs for UNG1 and UNG2 (positions 
529 and 520, respectively) . The resulting fragment of 

15 interest (501 bp) was isolated and ligated to the 5155 
bp fragment of Nhel/EcoHI -digested pUNGl-EGFP-Nl to 
obtain pUNG2-EGFP-Nl . Transient transf ections of HeLa 
cells were done with the CaP0 4 -method (Protection, 
Promega) according to the manufacturer's 

20 recommendations. Confocal microscopy (BioRad MRC-600) 
of HeLa cells and staining of mitochondria with mouse 
anti human mitochondria antibody (MAB 1273, Chemicon) 
and Texas Red anti -mouse IgG (Vector) were performed as 
previously described (Nagelhus et al . , 1995, Exptl . 

25 Cell Res., 220, p 292-297) . Examination of HeLa cells 
transf ected with expression plasmids pEGFP-Nl, pUNGl- 
EGFP-N1 or pUNG2-EGFP-Nl was carried out using an 
excitation wave length of 488 nm and emission wave 
length >515 nm at 16 hours after transfection 

30 

Results 

A human NT2 neuronal precursor cell cDNA library and a 
mouse embryonic carcinoma cDNA library were screened 
and a new form of human uracil -DNA glycosylase (human 
35 UNG2) encoded by the UNG gene, as well as the 

homologous cDNA from mouse (mouse UNG2) was identified. 
In addition the cDNA for the mouse homolog (encoding 
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mouse UNG1) of human UNG1 (Olsen et al . , 1989, supra) 
was identified. cDNA for human UNG2 has an ORF 
encoding 44 N- terminal amino acids not found in human 
UNG1 whereas cDNA for human UNG1 has an ORF encoding 35 
5 amino acids not found in human UNG2 (Figure 4) . The 
two forms are identical in the rest of the amino acid 
presequence, which is not required for enzyme activity, 
as well as in the catalytic domain, altogether 269 
identical consecutive amino acids. The sequence of the 

10 269 amino acids common to UNG1 and UNG2 , and the 

corresponding DNA sequence, is identical to amino acid 
residues 35-304 in Olsen et al . , 1989, supra. cDNAs 
for human UNG2 and its mouse homolog, are apparently as 
abundant as UNG1 in cDNA libraries from proliferating 

15 cells since among 20 cDNA clones that were sequenced 10 
were of the UNG2 type and 10 were similar to the 
previously known UNG1 type . Among 4 mouse cDNAs 
sequences, 3 were of the UNG2 type and 1 was of the 
UNG1 type. However, screeing of a human hepatocyte 

20 library with UNG40 cDNA resulted in the isolation of 80 
strongly hybridizing clones and sequencing of 14 of 
these demonstrated that they were all similar to the 
previously characterized cDNA for UNG1 or the cDNA 
UNG40 (Olsen et al . , 1989, supra). 

25 

Comparison of the human cDNA for UNG2 with the recently 
published complete human UNG sequence (Haug et al., 
1996, Genomics, 36, p408-416) revealed the presence of 
a previously unrecognised exon (exon 1A) located some 

30 650 base pairs upstream of the previously identified 
exon 1 (hereinafter called exon IB) . A revised 
organization of the UNG gene is therefore presented in 
Figure 3 . Exon IB forms the leader sequence and codon 
1-104 of the mRNA enbcoding the previously known form 

35 UNGl . The mRNA corresponding to the new human cDNA is 
formed by joining exon 1A (encoding 44 amino acids) 
into a consensus splice site after codon 35 in exon IB 
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after which the two human cDNAs are identical. The 
open reading frame of human UNG2 cDNA predicts a 
protein of 313 amino acids, as compared to 304 amino 
acids for UNG1 . Genomic clones for the mouse homolog 
5 of the UNG gene have also been isolated and sequenced. 
This has revealed that the splice sites for exons 3, 4, 
5 and 6 in the UNG genes from mouse and man are in 
identical positions. Furthermore, PCR analyses have 
demonstated that the rest of the mouse gene is 
10 structurally similar to the human gene, as expected 
from the cDNA clones (data not shown) . 

Figure 4 shows how the alternative forms of mRNA for 
UNG1 and UNG2 arise as deduced from human cDNAs and the 

15 corresponding UNG sequences and indicates the presence 
of a putative nuclear localization signal of 4 basic 
residues (RKRH) in the N- terminal end of the new cDNA 
and putative mitochondrial localization signals in cDNA 
for UNG1. In addition, and now shown here, both human 

20 cDNAs contain a putative nuclear localization signal 
(RKRHH) in the catalytic domain (residues 258-262 in 
the ORF of cDNA for UNG1) . These residues are located 
at the surface of the enzyme between a -helix 7 and li- 
st rand 4 (Mol et al., 1995, Cell, 80, p869-878) . 

25 

Figure 5 shows the genomic structure of exons 1A and 
IB, as well as the structure of the previously 
characterized promoter (hereinafter called promoter 
II) , possible elements in the putative promoter 

30 upstream of exon 1A (hereinafter called promoter I) and 
the alternative splice acceptor site (SEQUENCE I.D. No. 
7) . Promoter I probably starts after the 3' -terminal 
end of two Alu-repeats (position 425) and ends 
immediately upstream of the start of exon 1A. However, 

35 it can not be excluded that the promoter is located 
upstream of the Alu-repeats. This would require the 
presence of an exon encoding a leader that would be 
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joined to exon 1A. This is considered unlikely since 
promoter motifs upstream of the Alu- repeats have not 
been detected and furthermore transcripts of the 
required size have also not been detected by Northern 
5 analyses (data not shown) . Furthermore, the cDNA for 
UNG2 does not contain sequences from this upstream 
region . 

Figure 6 shows an alignment of predicted amino acid 

10 presequences of the human and mouse enzymes (SEQUENCE 

I.D. Nos 2 and 8-10) . Note that UNG1 proteins and UNG2 
proteins have been aligned separately in the parts of 
the proteins that are derived from different exons (up 
to codon 45 in human UNG2) . Table 4 shows the % of 

15 identical residues in the different forms, using human 
UNG2 as the reference (100%) . The parts of the protein 
that are not required for catalytic activity are less 
well conserved than the catalytic domain. Amino acids 
that have been found to be critical for catalytic 

20 activity or formation of the uracil -binding pocket (Mol 
et al., 1995, supra; Kavli et al . , 1996, EMBO J., 15, 
p3442-3447) or DNA binding are completely conserved in 
mouse (residues Q144, D145, P146, Y147, F158, S169, 
N204, S247, H268, S270, L272, S273, Y275 and R276 in 

25 UNG1) . 

To compare the promoter activity of promoter I alone 
and promoter I and promoter II in combination, 
promoter- lucif erase gene constructs were prepared and 

30 transient transfection experiments performed with HeLa 
cells. These studies verified the promoter activity of 
promoter II alone (Haug et al . , 1994, supra) and 
further demonstrated that when both promoters are 
present in the construct, the lucif erase activity 

35 increased some 50%, indicating that promoter I is also 
active in HeLa cells, as expected from the abundance of 
the new cDNA in proliferating cells (Table 5) ♦ 
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Coupled transcription- translation of the two forms of 
human and mouse cDNA resulted in easily measurable 
uracil -DNA glycosylase activity for both forms from 
mouse and man. For calculations of the relative 
5 specific activities, the radioactivity released in 
uracil -DNA glycosylase assays was compared to band 
intensities on an SDS-PAGE gel from 
transcription/translation reactions using [ 35 S] 
methionine (Table 6) . 

10 

To examine whether human UNG1 and UNG2 were 
translocated to different subcellular compartments, 
constructs expressing fusion proteins of the UNG 
proteins and a red shifted variant of green fluorescent 

15 protein (EGFP-N1) were prepared* These were used for 
transient transfection experiments with HeLa cells. 
The major advantage of the green fluorescent protein 
(over the use of antibodies) is that this method relies 
on the autof luorescence of this protein alone, and thus 

20 possible cross reaction of the antibody with epitopes 
in irrelevant proteins is not a problem. The control 
(pEGFP-Nl) shows that the green fluorescent protein 
displays a homogeneous staining over the cells (Figure 
7A) . In contrast, the UNG2-EGFP-N1 fusion protein is 

25 exclusively located in the nuclei (Figure 7C) and the 
UNG1-EGFP-N1 fusion protein (Figure 7D) is mainly, if 
not exclusively, located in extranuclear spots that 
have the same appearance as mitochondria stained with 
Texas red (Figure 7B) . These results provide 

30 convincing experimental evidence that UNG2 is a nuclear 
protein and UNG1 a mitochondrial protein. 
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Table 4 . Conservation of amino acids in four homologs 
of uracil -DNA glycosylase calculated as % identity with 
human UNG2* 


5 




% identity of 

domains 



Variant 

Common 

Catalytic 

Overall 


preeequence# 

presequence 

domain 

identity 


(1-44) 

(45-63) 

(64-313) 

(1-313) 

hUNGl 

2 

100 

100 

90 

mUNG2 

64 

75 

91 

86 

mUNGl 

2 

75 

91 

79 


15 

* The identity is calculated for the domains in UNG2 
compared with the corresponding domains in the other 
forms. # The identity of the presequences of hUNGl and 
mUNGl is 27% with 82% identity overall. 

20 

Table 5 . Promoter activites in the UNG gene* 


Promoter - reporter 

Luciferase activity 

gene construct 

% 

pGL2 -Basic 

0.8±0 .4 

pGL2-ProB 

100±8 

pGL2-ProAB 

156±4 


30 

* The promoter activity of pGL2-ProB 

(promoter II) was arbitrarily set to 100%. 
pGL2 -Basic is a control lacking promoter. 


35 
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Table 6 . Relative specific activities of different 
forms of UNG after translation in rabbit reticulocyte 
lysates* 

5 Protein dpm* Area (mm 2 ) Activity 

(dpm/area) 


human UNG1 12 91 

human UNG2 63 60 

10 human UNG1 921 

mouse UNG2 856 


0.054 23907 

0.268 23731 

0.061 15098 

0.051 16784 


* Relative specific activites were calculated from 
15 measured dpm- values ( 3 H-uracil released in uracil -DNA 
glycosylase assays) and areas under the curve of 
scanned bands on SDS-PAGE gels after subtraction of 
background values of 123 dpm. 
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Claims ; 

1.1. A DNA glycosylase capable of releasing cytosine 
bases from single stranded (ss) DNA and/or double 
5 stranded (ds) DNA (cytosine-DNA glycosylase) or thymine 
bases from both single stranded (ss) DNA and double 
stranded (ds) DNA or from single stranded (ss) DNA 
(thymine-DNA glycosylase) or uracil bases from single 
stranded (ss) DNA and/or double stranded (ds) DNA 
10 (uracil -DNA glycosylase) , wherein said uracil -DNA 
glycosylase is encoded by a nucleic acid molecule 
comprising the sequence (SEQUENCE I.D. Nos 1 and 2): 


15 


30 


45 


1 CACAGCCACA GCCAGGGCTA GCCTCGCCGG TTCCCGGGTG GCGCGCGTTC GCTGCCTCCT 

61 CAGCTCCAGG ATGATCGGCC AGAAGACGCT CTACTCCTTT TTCTCCCCCA GCCCCGCCAG 
MIG QKT LYSF FSP SPA 


121 GAAGCGACAC GCCCCCAGCC CCGAGCCGGC CGTCCAGGGG ACCGGCGTGG CTGGGGTGCC 
20 RKRH APS PEP AVQG TGV A G V 

181 TGAGGAAAGC GGAGATGCGG CGGCCATCCC AGCCAAGAAG GCCCCGGCTG GGCAGGAGGA 
PEES GDA A A I PARK APA GQE 

25 241 GCCTGGGACG CCGCCCTCCT CGCCGCTGAG TGCCGAGCAG TTGG ACCGG A TCCAGAGGAA 

EPG T PPS SPL SAEQ LDR IQR 

3 01 CAAGGCCGCG GCCCTGCTCA GACTCGCGGC CCGCAACGTG CCCGTGGGCT TTGGAGAGAG 
NKAA ALL RLA ARNV PVG FGE 


3 61 CTGGAAGAAG CACCTCAGCG GGGAGTTCGG GAAACCGTAT TTTATCAAGC T AATGGG ATT 
SWKK HLS GEF GKPY F1K LMG 


4 21 TGTTGCAGAA GAAAGAAAGC ATTACACTGT TTATCCACCC CCACACCAAG TCTTCACCTG 
35 FVAE ERK H Y T VYPP PHQ VFT 

4 81 GACCCAGATG TGTGACATAA AAGATGTGAA GGTTGTCATC CTGGGACAGG ATCCATATCA 
WTQM CDI KDV KVVI LGQ DPY 

40 541 TGGACCTAAT CAAGCTCACG GGCTCTGCTT TAGTGTTCAA AGGCCTGTTC CGCCTCCGCC 

HGPN QAH GLC FSVQ RPV PPP 

6 01 CAGTTTGGAG AACATTTATA AAGAGTTGTC TACAGACATA GAGGATTTTG TTCATCCTGG 
PSLE NIY KEL STDI EDF VHP 


6 61 CCATGGAGAT TTATCTGGGT GGGCCAAGCA AGGTGTTCTC CTTCTCAACG CTGTCCTCAC 
GHGD LSG WAK QGVL LLN AVL 
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721 GGTTCGTGCC CATCAAGCCA ACTCTCATAA GGAGCGAGGC TGGGAGCAGT TCACTGATGC 
TVRA H Q A NSH KERG WEQ FTD 

781 AGTTGTGTCC TGGCTAAATC AGAACTCGAA TGGCCTTGTT TTCTTGCTCT GGGGCTCTTA 
5 AVVS WLN QNS NGLV F L L WGS 

841 TGCTCAGAAG AAGGGCAGTG CCATTGATAG GAAGCGGCAC CATGTACTAC AGACGGCTCA 
YAQK KGS AID RKRH HVL QTA 

!Q 901 TCCCTCCCCT TTGTCAGTGT ATAGAGGGTT CTTTGGATGT AGACACTTTT CAAAGACCAA 

HPS .P LSV YRG FFGC RHF SKT 

961 TGAGCTGCTG CAGAAGTCTG GCAAGAAGCC CATTGACTGG AAGGAGCTGT GATCATCAGC 
NELL QKS GKK PIDW KEL 

15 

1021 TGAGGGGTGG C CTTTGAGAA GCTGCTGTTA ACGTATTTGC CAGTTACGAA GTTCCACTGA 
10 81 AAATTTTCCT ATTAATTCTT AAGTACTCTG CATAAGGGGG AAAAGCTTCC AG AAAG CAGC 
20 1141 CATGAACCAG GCTGTCCAGG AATGGCAGCT GTATCCAACC ACAAACAACA AAGGCTACCC 

1201 TTTGACCAAA TGTCTTTCTC TGCAACATGG CTT CGGCCT A AAATATGCAG AAGACAGATG 
1261 AGGTCAAATA CTCAGTTGGC TCTCTTTATC TCCCTTGCCT TTATGGTGAA ACAGGGGAGA 

25 

1321 TGTGCACCTT TCAGGCACAG CCCTAGTTTG GCGCCTGCTG CTCCTTGGTT TTGCCTGGTT 
1381 AG ACTTT C AG TGACAGATGT TGGGGTGTTT TTGCTTAGAA AGGTCCCCTT GTCTCAGCCT 
30 1441 TGCAGGGCAG GCATGCCAGT CTCTGCCAGT TCCACTGCCC CCTTGATCTT TGAAGGAGTC 

15 01 CTCAGGCCCC TCG C AG CAT A AGGATGTTTT GCAACTTTCC AGAATCTGGC CCAGAAATTA 
1561 GGGCTCAATT TCCTGATTGT AGTAGAGGTT AAGATTGCTG TGAGCTTTAT CAGATAAGAG 

35 1621 ACCGAGAGAA GTAAGCTGGG TCTTGTTATT CCTTGGGTGT TGGTGGAATA AGCAGTGGAA 

1681 TTTGAACAAG GAAGAGGAGA AAAGGGAATT TTGTCTTTAT GGGGTGGGGT GATTTTCTCC 

4Q 1741 TAGGGTTATG TCCAGTTGGG GTTTTTAAGG CAGCACAGAC TGCCAAGTAC TGTTTTTTTT 

1801 AACCGACTGA AATCACTTTG GGATATTTTT TCCTGCAACA CTGGAAAGTT TTAGTTTTTT 
1861 AAGAAGTACT CATGCAGATA TATATATATA TATTTTTCCC AGTCCTTTTT TTAAGAGACG 

45 

1921 GTCTTTATTG GGTCTGCACC TCCATCCTTG ATCTTGTTAG CAATGCTGTT TTTGCTGTTA 
1981 GTCGGGTTAG AGTTGGCTCT ACGCGAGGTT TGTTAATAAA AGTTTGTTAA AAGTTCAAAA 
50 2041 AAAAAAAAAA AAA 


or a fragment thereof encoding a catalytically active 
product comprising at least nucleotides 121 to 130 in 
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addition to the catalytic domain, or a sequence which 
is degenerate, substantially homologous with or which 
hybridizes with at least nucleotides 121 to 130 of any 
such aforesaid sequence. 

5 

2. A cytosine-DNA glycosylase (CDG) as claimed in 
claim 1 . 

3. A cytosine-DNA glycosylase (CDG) as claimed in 

10 claim 1 or 2 wherein said CDG is capable of releasing 

both cytosine and uracil bases from ssDNA and/or dsDNA. 

4. A cytosine-DNA glycosylase (CDG) as claimed in any 
one of claims 1 to 3 wherein said CDG is derived from 

15 UDG. 

5. A CDG as claimed in claim 4 wherein Asn at amino 
acid position 204 in human UDG protein or equivalent 
residue in other species is substituted or modified* 

20 

6 . A CDG as claimed in claim 5 wherein said Asn or 
equivalent residue is replaced with an aspartic acid 
residue (Asp) . 

25 7. A thymine-DNA glycosylase (TDG) as claimed in claim 
1. 

8. A thymine-DNA glycosylase (TDG) as claimed in claim 
1 or 7 wherein said TDG is capable of releasing both 

30 thymine and uracil bases from both ssDNA and dsDNA. 

9. A thymine-DNA glycosylase (TDG) as claimed in any 
one of claims 1, 7 or 8 wherein said TDG is capable of 
releasing thymine bases from A:T DNA pairs. 

35 

10. A thymine-DNA glycosylase (TDG) as claimed in any 
one of claims 1, 7 or 8 wherein said TDG is capable of 
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releasing thymine bases from single stranded DNA. 

11. A thymine-DNA glycosylase (TDG) as claimed in any 
one of claims 1 or 6 to 9 wherein said TDG is derived 

5 from UDG. 

12 . A TDG as claimed in claim 11 wherein Tyr at amino 
acid position 147 in human UDG protein or equivalent 
residue in other species is substituted or modified. 

10 

13 . A TDG as claimed in claim 12 wherein said Tyr or 
equivalent residue is replaced with an alanine residue 
(Ala) . 

15 14 , A CDG or TDG as claimed in any one of claims 4 to 
6 or 11 to 13 wherein said UDG is human. 

15. A uracil-DNA glycosylase (UDG) as claimed in claim 
1. 

20 

16. A uracil-DNA glycosylase (UDG) as claimed in claim 
1 or 15 wherein said fragment comprises at least 
nucleotides 71 to 202 and/or degenerate, substantially 
homologous and hybridizing sequences are degenerate, 

25 substantially homologous with or hybridize with at 
least nucleotides 71 to 202. 

17. A UDG as claimed in claim 16 wherein degenerate, 
substantially homologous and hybridizing sequences are 

30 degenerate, substantially homologous with or hybridize 
with the entire sequence. 

18. Nuclear localization peptides encoded by a nucleic 
acid molecule comprising the sequence (SEQUENCE I.D. 

35 Nos 3 and 4) : 

ATGATCGGCC AGAAGACGCT CTACTCCTTT TTCTCCCCCA GCCCCGCCAG 
MIG QKT LYSF FSP SPA 
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GAAGCGACAC GCCCCCAGCC CCGAGCCGGC CGTCCAGGGG ACCGG CGTGG CTGGGGTGCC 
RKRH APS PEP AVQG TGV A G V 

TGAGGAAAGC GGAGATGCGG CG 
5 PEES GDA A 

or a fragment thereof encoding a functional equivalent 
or a sequence which is degenerate, substantially 
homologous with or which hybridizes with any such 
10 aforesaid sequence . 

19. Nuclear localizing peptides as claimed in claim 18 
wherein said peptides include the amino acid sequence 
RKRH. 

15 

20 . A DNA glycosylase as claimed in any one of claims 
1 to 17 which additionally comprises at least one 
nuclear localization peptide sequence as defined in 
claim 18 or 19 or at least one mitochondrial 

20 localization peptide sequence encoded by a nucleic acid 
molecule comprising the sequence (SEQUENCE I.D. Nos 5 
and 6) : 

ATGGGCGTCT TCTGCCTTGG GCCGTGGGGG TTGGGCCGGA AGCTGCGGAC GCCTGGGAAG 
25 MGV FCL GPWG LGR KLR TPGK 

GGGCCGCTGC AGCTCTTGAG CCGCCTCTGC GGGGACCACT TGCAG 
GPL QLL SRLC GDH LQ 

30 or a fragment thereof encoding a functional equivalent 
or a sequence which is degenerate, substantially 
homologous with or which hybridizes with any such 
aforesaid sequence. 

35 21. An assay for the identification of DNA 

glycosylases as defined in any one of claims 1 to 14 in 
a sample, in which said assay comprises at least the 
step of assaying for activity in the sample which is 
capable of excising thymine or cytosine and optionally 

40 also uracil from an introduced ssDNA and/or dsDNA 
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substrate . 

22. A nucleic acid molecule comprising a nucleotide 
sequence which encodes a DNA glycosylase and/or nuclear 

5 localizing peptide as defined in any one of claims 1 to 
20. 

23 . An expression vector containing a nucleic acid 
molecule as defined in claim 22. 

10 

24 . A transformed or transf ected host cell carrying a 
nucleic acid molecule as defined in claim 22. 

25 . Use of one or more DNA glycosylases as defined in 
15 any one of claims 1 to 17 in in vitro and/or i n v i vo 

mutagenesis systems . 

26 . Use of one or more DNA glycosylases as defined in 
any one of claims 1 to 14 for removing contaminating 

20 DNA prior to PCR amplification. 

27 . Use of one or more DNA glycosylases as defined in 
any one of claims 1 to 14 in DNA modification. 

25 28. A method of killing cells, comprising the steps of 
introducing a DNA glycosylase as defined in any one of 
claims 1 to 17 into a cell and expressing said DNA 
glycosylase in the cell to an extent which results in 
the killing of that cell. 

30 

29. A method of performing enzymatic DNA sequencing to 
determine the position of cytosine and/or thymine bases 
by treating said DNA with at least one CDG and/ or TDG 
as defined in any one of claims 1 to 14. 
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Figure 1 
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Figure 4 
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hUNGl MGVFCLGPWGLGRKLRTPGKGPLQLLSRLC GDHIsQK 3 6 

mUNG 1 MGV LGRRSLR — LAKRAGLRSL TPNPDSDSRQ& 3 1 

hUNG2 MXGQKTIiYSFFSPSPARKRHAPSPEPXVQGTGVAGVPEBSGDAAk 4 5 

mUNG2 MTG QKTI* YSFFSPTPTGKR TTRSPEP - VPGSGVAA — SXGGDAVA 4 2 

hUNG2 / 1 IPAKKAPAGQEEPGTPPSSPLSAEQLDRIQRNKAAALLRLAARNV 90/81 

IHUNG2 / 1 SPAKKARVEQNEQG SPLSAEQLVRIQRNKAAAIiIiRIiAARNV 83/72 


hUNG2/l 
mUNG2/l 

hUNG2/l 
mUNG2/l 


P VG FG E SWKKHL SGE FGK P YF I KLMGFVAEERKH YTVY P P PHQVF 
PAGFGESWKQQLCGEFGKPYFVKLMGFVAEERNHHKVYPPPEQVF 


TWTQMCDIKDVKVVILGQDPYHGPNQAHGLCFSVQRPVPPPPSLE 
TWTQMCDIRDVKWILGQDPYHGPNQAHGLCFSVQRPVPPPPSLE 


135/126 
128/117 

180/171 
173/162 


hUNG2 / 1 NI YKELSTDIEDFVHPGHGDLSGWAKQGVLLLNA^TVRAHQANS 225/216 
mUNG2 / 1 NIFKELSTDIDGFVHPGHGDLSGWARQGVLLLNAVLTVRAHQANS 218/2 07 


hUNG2 / 1 H KERGWEQFTD AWS WLNQNSNG LVFLLWG S YAQKKG S AI DRKRH 270/261 

mUNG2 / 1 HKERGWEQFTDAWSWLNQNLSGLVFLLWGSYAQKKGSVIDRKRH 2 63/252 

hUNG2 / 1 HVLQTAHPSPLSVYRGFFGCRHFSKTNELLQKSGKKPIDWKEL 313/304 

mUNG2 / 1 HVLQTAHPSPLSVYRGFLGCRHFSKANELLQKSGKKPINWKEL 306/295 
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